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ABSTRACT 


In  this  research  we  present  a  trust  region  algorithm  for  solving  the  equality 
constrained  optimization  problem.  This  algorithm  is  a  variant  of  the  1984  Celis- 
Dennis-Tapia  algorithm.  The  augmented  Lagrangian  function  is  used  as  a  merit 
function.  A  scheme  for  updating  the  penalty  parameter  is  presented.  The 
behavior  of  the  penalty  parameter  is  discussed. 

We  present  a  global  and  local  convergence  analysis  for  this  algorithm.  We 
also  show  that  under  mild  assumptions,  in  a  neighborhood  of  the  minimizer,  the 
algorithm  will  reduce  to  the  standard  SQP  algorithm;  hence  the  local  rate  of  con¬ 
vergence  of  SQP  is  maintained. 

Our  global  convergence  theory  is  sufficiently  general  that  it  holds  for  any 
algorithm  that  generates  steps  that  give  at  least  a  fraction  of  Cauchy  decrease  in 
the  quadratic  model  of  the  constraints. 
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CHAPTER  ONE 


INTRODUCTION 

This  chapter  consists  of  two  parts.  In  the  first  part  we  define  the  general 
optimization  problem  and  some  special  cases  of  this  problem.  We  also  state  the 
optimality  conditions  for  some  of  these  special  cases.  The  second  part  is  devoted 
to  presenting  from  the  historical  point  of  view,  some  methods  that  attempt  to 
solve  the  equality  constrained  optimization  problem. 

1.1  CLASSIFICATION  OF  THE  PROBLEMS 

By  the  general  optimization  problem  we  mean  the  problem  of  finding  x*  e  S 
that  solves  the  following  problem: 

minimize  /  (i)  ,  (GOP) 

subject  to  x  e  S  , 

where  /  is  assumed  to  be  a  smooth  nonlinear  function  defined  from  S  into  R  . 
A  point  x*  e  S  is  said  to  be  a  local  solution  of  problem  (GOP)  if  there  exists  a 
neighborhood  N(x*)  such  that  f  (x*)  <  f  (x)  for  all  x  e  N(x*)  p|  S  . 

The  optimization  problem  can  be  characterized  by  the  type  of  set  S  on 
which  /  is  to  be  minimized.  If  S  is  Rn ,  then  the  problem  will  be  referred  to 
as  an  unconstrained  optimization  problem  or  problem  (UCOP).  It  can  be  written 
as: 
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minimize  /  (x)  ,  (UCOP) 

x  e  R  n 

i.e. ,  the  unconstrained  optimization  problem  is  the  problem  of  minimizing  / 
without  constraints. 

If  f  e  C1  ,  then  a  necessary  condition  for  x*  e  Rn  to  be  a  solution  of  (UCOP)  is 

V  /  (ar*)  =  0  , 

where  V  /  denotes  the  gradient  of  / .  It  is  also  necessary,  if  /  e  C2  ,  that  the 
Hessian  of  /  at  x*  be  positive  semidefinite. 

Sufficient  conditions  for  x*  e  Rn  to  be  a  local  solution  of  (UCOP)  are: 

V/(z,)  =  0  , 
vT  V2  f{xt)  v  >  0  , 

for  all  nonzero  vectors  v  e  R "  . 

On  the  other  hand,  if  S  can  be  defined  by  a  set  of  equality  and  inequality 
constraints  then  the  problem  will  be  called  the  general  nonlinear  programming 
problem.  So,  by  the  general  nonlinear  programming  problem  we  mean  the  con¬ 
strained  optimization  problem: 

minimize  /  (x)  , 

subject  to  hi  (x)  =  0  (NLP) 

9j{x)>  0  j=l,...,p  . 

where  /  ,  h{  ,  and  gj  are  assumed  to  be  smooth  nonlinear  functions  defined 
from  Rn  into  R  . 

As  a  special  case  of  this,  if  we  seek  to  minimize  /  on  a  manifold  S  defined 
by  equations  of  the  form: 


^  (x)  =  0 


<  n 
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i.e. ,  we  are  concerned  with  the  case  where  only  equality  constraints  are  involved, 
then  we  refer  to  this  problem  as  the  equality  constrained  optimization  problem  or 
problem  (EQ),  and  it  can  be  expressed  as: 

minimize  f  {x)  ,  (EQ) 

subject  to  hi  ( x )  =0  . 

On  the  other  hand,  we  will  refer  to  the  problem  in  which  only  inequality  con¬ 
straints  are  involved  as  the  inequality  constrained  optimization  problem  or  prob¬ 
lem  (INEQ).  It  can  be  expressed  as: 

minimize  f  [x)  ,  (INEQ) 

subject  to  g{(  x  )  >  0  . 

In  this  research,  we  consider  only  the  equality  constrained  optimization  or 
problem  (EQ).  We  will  denote  by  h(x)  the  vector  whose  components  are 
hi(x)  .  When  /  and  h  e  C2  ,  we  will  say  problem  (EQ)  e  C2  . 

It  is  convenient  to  introduce  the  Lagrangian  function  /  :  Rn  x  Rm  — ►  R  asso¬ 
ciated  with  problem  (EQ).  It  is  the  function: 

l{x,\)  =  f{x)  +  \T  h(x)  ,  (1.1.1) 

where  X  =  (Xj,  .  .  .  ,\m)T  is  called  the  Lagrange  multiplier. 

Stating  necessary  optimality  conditions  in  terms  of  the  Lagrangian  function 
requires  a  constraint  qualification.  A  satisfactory  but  somewhat  restrictive  con¬ 
straint  qualification  is  the  regularity  assumption:  that  is,  the  vectors 
V/i,(ar)  are  linearly  independent.  Any  feasible  point  at  which  the  regu¬ 

larity  assumption  is  satisfied  is  called  a  regular  point.  We  will  use  the  notation 
Vh(x)  to  mean  the  matrix  whose  columns  are  V/j,-(x) 

The  first  order  necessary  conditions,  or  Kuhn-Tucker  conditions,  are  that  x*  be  a 
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feasible  point  (  i.e.  h  (a;*)  =  0  ),  and  that  there  exists  a  Lagrange  multiplier  X* 
such  that: 

V/(z»,X,)  =  0  . 

The  second  order  necessary  condition  is  that  the  Hessian  of  the  Lagrangian  func¬ 
tion  is  positive  semidefinite  for  all  vectors  that  lie  in  the  null  space  of  Vh(x*)T  . 
That  is,  for  all  v  that  satisfies:  Vh(x*)T  v  =  0  ,  we  have 

vT  V%l(xt  ,  X*)  v  >  0  . 

Sufficient  conditions  for  x *  to  be  an  isolated  local  minimizer  of  problem  (EQ)  are 
that  xt  is  a  Kuhn-Tucker  point  (i.e.  x*  and  X*  satisfy  the  first  order  necessary 
conditions),  and  that 

vT  V|/(  x*  ,  X*  )  v  >0 

for  every  nonzero  vector  v  that  satisfies  Vh(xt)T  v  —  0  . 

1.2  HISTORICAL  BACKGROUND 

In  this  section  we  present  some  methods  that  attempt  to  solve  problem  (EQ). 
We  start  with  the  sequential  unconstrained  minimization  techniques,  or  (SUMT), 
which  were  popularized  by  A.  Fiacco  and  G.  McCormick  in  the  late  60’s.  Then 
we  present  the  multiplier  methods  which  were  famous  in  the  early  70’s.  After 
that  we  present  five  different  ways  to  extend  Newton’s  method  from  uncon¬ 
strained  optimization  to  constrained  optimization. 


1.2.1)  The  Penalty  Function  Methods 
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Some  of  the  earliest  practical  approaches  for  solving  problem  (EQ)  were  the 
sequential  unconstrained  minimization  techniques  or  (SUMT).  These  techniques 
are  based  on  solving  a  sequence  of  unconstrained  minimization  problems  whose 
solutions  approach  the  solution  of  problem  (EQ).  Penalty  function  methods 
belong  to  this  class.  [See  Fiacco  and  McCormick  (1968)] 

Penalty  function  methods  solve  a  sequence  of  minimization  subproblems  in 
which  a  "penalty"  term  for  constrained  violation  is  added  to  the  objective  func¬ 
tion.  The  first  penalty  function  was  suggested  by  Courant  (1943)  for  problem 
(EQ).  It  is  the  function: 

P(x,r)  =  f  (x)  +  ~  r  h(x)T h(x)  r>  0. 

It  can  be  shown  under  mild  assumptions  that  if  x(r)  is  a  minimizer  of  P(x,r) 
for  every  r ,  then: 

lim  x(r)  =  x *  , 

r— ►  oo 

where  x *  is  a  solution  to  problem  (EQ)  [see  for  example  Poljak(l971)].  The 
penalty  function  methods  can  be  stated  as  follows: 

ALGORITHM  (1.2.1):  Penalty  Function  Method 

1)  Given  x0  ,  choose  r0  >  0  . 

2)  For  k  =  1,2,...  until  convergence  do 

i)  Find  xk+1  such  that: 

xk+1  =  argmin  P{x,rk )  . 

X 


ii)  Choose  rk+1  >  rk  . 
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These  methods  generate  a  sequence  of  infeasible  points.  In  fact,  each  iterate 
is  either  necessarily  infeasible  or  a  solution  of  problem  (EQ).  These  methods  are 
not  appropriate  for  problems  in  which  feasibility  must  be  maintained. 

The  availability  of  powerful  methods  for  solving  unconstrained  optimization 
problems,  the  well- developed  theoretical  background  and  the  comparative  simpli¬ 
city  of  these  methods  are  attractive.  However,  in  practice  it  is  inefficient  to 
require  that  the  sequence  of  unconstrained  minimization  problems  be  solved 
exactly  and  they  suffer  from  severe  numerical  difficulties  since  the  unconstrained 
problems  that  must  be  solved  become  increasingly  more  ill-conditioned  as  the 
solution  is  approached. 

1.2.2)  The  Multiplier  Method 

It  is  well  known  that  in  order  to  guarantee  convergence  of  the  penalty  func¬ 
tion  methods  the  penalty  parameter  must  go  to  infinity,  and  so  the  problem 
becomes  increasingly  ill-conditioned.  Therefore,  it  would  be  useful  to  derive 
methods  for  which  the  parameters  need  only  assume  moderate  values. 

These  concerns  motivated  Hestenes  (1969)  to  introduce  his  multiplier  method. 
He  suggested  the  augmented  Lagrangian  function: 

L(x ,\,r)  =  f  (x)  +  \T h(x)  +  r  h(x)T h(x)  .  (1.2.1) 

The  multiplier  method  consists  of  updating  an  estimate  of  the  Lagrange  multiplier 
X  and  sometimes  the  penalty  parameter  at  each  iteration.  The  multiplier 
method  can  be  stated  as  follows: 


ALGORITHM  (1.2.2):  The  Multiplier  Method 
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1)  Given  x0  e  Rn  and  r0>0,  determine  X0  e  Rm . 

2)  For  k  =  1,2,...  until  convergence  do 

i)  Find  xk+l  such  that: 

*t+i  =  argmin  L(x,\k,rk)  . 

X 

ii)  Update  rk  by  some  update  formula. 

iii)  Update  \k  by  some  multiplier  update  formula. 

As  an  update  formula  for  the  estimate  of  the  multiplier,  Hestenes  (1969)  and 
independently  Powell  (1969)  suggested: 

x*+i  =  x*  +  rk  h(xk)  .  (1.2.2) 

Flaarhoff  and  Buys  (1970)  proposed: 

x*+i  =  ~  (^hk+i  V/it+i)  *VA/+1  V/i+1  .  (1.2.3) 

Buys  (1972)  proposed: 

x*+i  =  x*  +  (VA^V^r1**  •  (1.2.4) 

Miele  (1972)  proposed: 

x*+ 1  =  (^hk+i  Vhk+1)  \hk+1  —  Vhk+1  V/t+1)  .  (1.2.5) 

The  formula: 

X*+1  =  (V/^V^-'V/**)-1  [  hk  -  VhTVX~1  (Vfk+rkVhkhk)  ]  .  (1.2.6) 

was  suggested  by  Tapia  (1974a),  (1974b)  in  a  different  context. 

A  complete  analysis  of  the  multiplier  methods  was  presented  by  Tapia 
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(1977).  Computational  experience  with  the  multiplier  methods  was  reported  by 
Miele  et  al.  (1971a),  (1971b),  (1972a),  and  (1972b). 

It  is  shown  by  Buys  (1972)  that  if  we  define  the  dual  of  problem  (EQ)  to  be 
the  following  problem: 

max  min  L  (  x  ,  X  ;  r  )  , 

\  X 

where  L  is  the  augmented  Lagrangian  function  (1.2.1)  and  r  is  a  sufficiently 
large  fixed  penalty  parameter,  then  if  x *  solves  the  primal  problem  (7e.  x * 
solves  problem  (EQ)),  then  its  associated  Lagrange  multiplier  X*  solves  the  dual 
problem  and  x*  can  be  obtained  from  X*  as  the  solution  of 

min  L  (  x  ,  X*  ;  r  )  . 

X 

The  multiplier  method  with  multiplier  update  formula  (1.2.2)  or  (1.2.3)  is  the  gra¬ 
dient  method  applied  to  the  dual  problem;  and  the  multiplier  method  with  multi¬ 
plier  update  formula  (1.2.4)  or  (1.2.6)  is  Newton’s  method  applied  to  the  dual 
problem.  [Buys  (1972)] 

Based  on  these  facts,  the  rate  of  convergence  of  the  multiplier  method  with  a 
sufficiently  large  fixed  penalty  parameter  using  (1.2.2)  or  (1.2.3)  as  an  update  for¬ 
mula  for  the  multiplier  can  be  shown  to  be  q-linear  in  x  and  in  X  .  Additional 
results  show  that  it  is  q-superlinear  in  X  if  and  only  if  the  penalty  parameter 
goes  to  infinity  [see  for  example  Bertsekas  (1976)].  On  the  other  hand,  the  rate  of 
convergence  of  the  multiplier  methods  with  a  sufficiently  large  fixed  penalty  con¬ 
stant  using  (1.2.4)  or  (1.2.6)  as  an  update  formula  for  the  multiplier  can  be  shown 
to  be  q-quadratic  in  x  and  in  X. 
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1.2.3)  Newton’s  Method  For  Problem  (EQ) 

Up  to  this  point,  from  the  historical  and  chronological  points  of  view  we  have 
seen  that  the  price  we  pay  for  convergence  in  the  penalty  function  methods  is  a 
deterioration  in  numerical  conditioning,  since  the  penalty  parameter  must  go  to 
infinity.  The  parameterized  subproblem  that  has  to  be  solved  at  each  iteration  in 
the  multiplier  method  suffers  from  ill-conditioning  since  the  penalty  parameter  has 
to  be  set  to  a  sufficiently  large  value.  In  the  multiplier  method  using  (1.2.2)  or 
(1.2.3)  as  an  update  formula  for  the  estimate  of  the  multiplier,  in  order  to  guaran¬ 
tee  fast  convergence,  again  the  penalty  parameter  must  go  to  infinity,  and  the 
problem  becomes  increasingly  ill-conditioned.  Although  the  multiplier  method 
using  (1.2.4)  or  (1.2.6)  gives  fast  local  convergence,  it  still  suffers  from  the  fact 
that  the  subproblem  requires  a  complete  minimization  in  x  in  order  to  get  a 
step.  To  address  these  problems,  an  algorithm  which  would  give  fast  convergence 
without  a  corresponding  deterioration  in  numerical  conditioning  is  needed.  Such 
an  algorithm  is  presented  in  this  section. 

Five  different  ways  to  extend  Newton’s  method  from  unconstrained  optimiza¬ 
tion  to  constrained  optimization  have  been  suggested.  These  are  the  extended 
problem,  the  successive  quadratic  programming,  the  diagonalized  multiplier 
method,  the  structured  multiplier  substitution  method,  and  Goodman’s  method. 

This  section  is  devoted  to  a  discussion  of  each  of  these  ways.  We  start  our 
discussion  of  extending  Newton’s  method  to  problem  (EQ)  by  considering  the 
extended  problem.  Then  we  will  consider  the  successive  quadratic  programming, 
the  diagonalized  multiplier  method,  the  structured  multiplier  substitution  method, 
and  finally  Goodman’s  method. 
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1. 2.3.1)  The  Extended  Problem: 

Suppose  problem  (EQ)  e  C2.  Let  x *  be  a  local  solution  which  is  also  a  reg¬ 
ular  point.  The  first  order  necessary  conditions  and  the  regularity  assumption 
imply  that  there  exists  a  Lagrange  multiplier  X*  such  that  (ar*,X*)  is  a  solution  of 
the  following  nonlinear  system: 

V,/(®,X)  =  0  (1.2.2) 

h(x)  =  0. 

Following  Tapia  (1977),  (1978),  by  the  extended  system  we  mean  the  non¬ 
linear  system  of  equations  (1.2.2),  and  by  the  extended  problem  corresponding  to 
problem  (EQ)  we  mean  the  problem  of  finding  a  stationary  point  of  the  Lagran- 
gian  function,  i.e.  solving  for  a  root  of  the  extended  system.  Now,  consider 
applying  Newton’s  method  to  solve  the  extended  problem.  Our  assumption  will 
be  the  standard  assumptions  of  Newton’s  method.  Specifically,  we  assume  the  fol¬ 
lowing: 

(1)  /  and  h  e  C 2. 

(2)  V2/^*^*)  is  invertible 

(3)  V2/  is  Lipschitz  continuous  with  respect  to  a:  in  a  neighborhood  of  the 

solution. 

Newton’s  method  on  the  extended  system  can  be  stated  as  follows: 

ALGORITHM  (1.2.3)  Newton’s  Method  on  the  Extended  System 

1)  Given  x0  e  Rn  and  X0  e  Rm 

2)  For  k  —  1,2,...  until  convergence  do 

i)  Solve  for  (  s  ,  AX  )  the  following  linear  system 
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^7j4s  +  ^hkAk  =  —  V2  4 
VA*r. s  =-hk  . 

ii)  Set:  xk+1  =  xk  +  s  . 
in)  Set:  ^  =  X^  -f-  AX  . 

Under  the  standard  assumptions  of  Newton’s  method,  this  algorithm  gives 
local  q-quadratic  convergence  in  (ar,X).  [See  Tapia  (1977)] 

1. 2.3.2)  The  Successive  Quadratic  Programming  Method  (SQP): 

The  successive  quadratic  programming  method  is  effective  for  solving  problem 
(EQ).  Algorithms  of  this  type  compute  the  minimizer  of  problem  (EQ)  by  solving 
a  sequence  of  quadratic  programming  subproblems.  Namely,  by  the  successive 
quadratic  programming  method  or  (SQP),  we  mean  the  iterative  procedure: 

ALGORITHM  (1.2.4):  The  Successive  Quadratic  Programming  Method 

1)  Given  x0  €  Rn  ,  determine  X0  e  Rm  . 

2)  For  A:  =  1,2,...  until  convergence  do 

i)  Find  a  solution  (s  ,  AX^)  to  the  following  quadratic  programm¬ 
ing  problem: 

minimize  Vz  lk  s  +  —  s  T  V%lk  s 
2 

subject  to  hk  +  Vhk  s  =  0  . 

ii)  Set  xk+1  =  xk  +  sQP  . 

iii)  Set  \k+l  =  \k  +  AX§f  . 
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1.2.3. 3)  The  Diagonalized  Multiplier  Method  (DMM): 

At  each  iteration,  the  multiplier  method  described  in  Section  (1.2.3)  goes 
through  a  complete  minimization  step  for  x  and  only  one  update  for  X  , 
although  we  are  solving  for  both  the  minimizer  x *  and  its  associated  multiplier 
X*  .  It  would  then  make  sense  to  update  the  estimate  of  the  multiplier  after  each 
update  of  the  minimizer.  This  idea  motivated  Tapia  (1977)  to  introduce  the  diag¬ 
onalized  multiplier  method.  It  can  be  written  as  follows: 

ALGORITHM  (1.2.5):  The  Diagonalized  Multiplier  Method 

1)  Given  x0  e  Rn  ,  determine  X0  e  Rm  . 

2)  For  k  =  1,2,...  until  convergence  do 

i)  Update  \k  by  some  multiplier  update  formula. 

ii)  Calculate: 

**+i  =  v,/(*4,x4+1)  ■ 

1. 2.3.4)  Structured  Multiplier  Substitution  Method  (SMSM): 

Consider  an  estimate  of  the  Lagrange  multiplier  of  the  form: 

X(x)  =  (  Vh{x)T  D  Vh(x)  r1  (  h{x)-Vh{x)TD  Vf{x)  )  ,  (1.2.3) 

where  D  is  any  nXn  positive  semi-definite  matrix  that  may  depend  on  x.  Then 
VL(x,\(x))  =  0  is  equivalent  to  (  x  ,  X(a;)  )  being  a  stationary  point  of  the  aug¬ 
mented  Lagrangian  given  by  (1.2.1).  This  powerful  fact  motivated  Tapia  (1978) 
to  introduce  the  multiplier  substitution  method.  The  idea  is  straightforward; 
solve  for  a  root  of  the  following  problem: 
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VxL(x,\(x))  =  0  .  (1.2.4) 

using  any  iterative  scheme. 

By  the  multiplier  substitution  Newton’s  method,  we  mean  the  multiplier  sub¬ 
stitution  method  using  Newton’s  method  as  an  iterative  scheme  to  solve  (1.2.4). 

By  the  structured  multiplier  substitution  method,  we  mean  the  multiplier 
substitution  method  taking  the  advantage  of  omitting  the  terms  that  vanish  at 
the  solution.  This  method  can  be  stated  as  an  algorithm  as  follows: 

ALGORITHM  (1.2.6):  Structured  Multiplier  Substitution  Method 

1)  Given  x0  e  Rn ,  determine  X0  e  Rm . 

2)  For  A;  =  1,2,...  until  convergence  do 

i)  Solve  for  s  the  following  linear  system 

{I-A  D)[VX*  +Vfk]+Vhk(VhkTDVhk)-'\VhkTs  +hk)  =  0, 

where  A{x)  =  Vh(x){Vh{x)T  D  Vh(x))-lVh(x)T  . 

ii)  Set  xk+l  =  xk  +  s 


The  four  methods  discussed  in  this  section  are  equivalent.  Specifically,  Tapia 
(1978)  showed  that  for  problem  (EQ),  the  extended  problem  with  the  Lagrangian 
function  given  by  (1.1.1),  the  successive  quadratic  programming  method,  the  diag¬ 
onalized  multiplier  method,  and  the  structured  multiplier  substitution  method 
generate  identical  (ar  ,X)  iterates. 


1.2. 3. 5)  Goodman’s  Method  (GM): 
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Let  x*  €  Rn  be  a  feasible  point.  If  zk(x*)  ,  ,  zn_m(x »)  are  a  basis  for  the 

null  space  of  Vh,(x*)T ,  then  a  necessary  condition  for  x *  to  be  a  local  minimizer 
of  problem  (EQ)  is 

V/ (x*)T  Zi(x*)  =0,  i  =1  ,...,n-m  .  (1.2.5) 

If  we  define  z{(x)  in  a  neighborhood  of  x *  ,  and  let  Z(x )  be  the  matrix 
whose  columns  are  z^x)  ,  i  =  1 ,  then  Goodman’s  method  can  be  defined 
to  be  the  method  that  uses  Newton’s  method  to  solve  the  following  nXn  non¬ 
linear  system 

Z(x)T  VJ(x)  =  0 
h(x )  =  0  . 

This  method  can  be  stated  as  follows: 

ALGORITHM  (1.2.7):  Goodman’s  Method 

1)  Given  x0  e  Rn  ,  determine  X0  e  Rm  . 

2)  For  k  =  1,2,...  until  convergence  do 

i)  Form  a  basis  Z{xk)  for  the  null  space  of  V/i( xk)T  . 

ii)  Find  a  solution  s  to  the  following  linear  system: 

ZMT  =  -z(*t)T  v/(xt) 

Vh(xt)T  s  =  -  h(xk)  . 

where  W(xk)  is  the  Hessian  of  the  Lagrangian  function 
fix)  +  h(x)T\{xk)  . 


iii)  Set  xk+l  =  xk  +  s. 
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It  is  easy  to  see  that  for  problem  (EQ),  Goodman’s  method  is  equivalent  to 
the  successive  quadratic  programming  method  using  the  projection  formula  (1.2.3) 
to  update  the  estimate  of  the  Lagrange  multiplier.  [Goodman  (1985)] 

Of  these  equivalent  formulations,  the  SQP  method  is  the  most  visible  and 
popular.  The  main  reason  for  its  popularity  is  that  it  allows  inclusion  of  inequal¬ 
ity  constraints  in  a  straightforward  manner.  To  do  so,  one  merely  carries  them 
along  as  linearized  inequalities  in  the  quadratic  program.  Another  reason  for  its 
popularity  is  that  the  SQP  approach  allows  use  of  existing  quadratic  program¬ 
ming  modules  in  its  implementation. 

From  a  theoretical  point  of  view,  the  extended  problem  plays  a  very  impor¬ 
tant  role  and  has  been  in  the  background  of  the  derivation  of  many  algorithms. 
This  formulation  is  widely  used  for  the  convergence  analysis  of  its  equivalent 
methods. 
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CHAPTER  TWO 

GLOBALIZATION  STRATEGIES 

It  is  known  that  Newton’s  method  is  locally  q-quadratically  convergent  under 
reasonable  hypothesis.  This  means  that  there  exists  a  neighborhood  of  the  solu¬ 
tion  such  that  if  the  starting  point  lies  in  that  neighborhood,  the  sequence  of 
iterates  generated  by  the  method  will  converge  rapidly  to  that  solution. 

This  chapter  deals  with  modifications  to  such  methods  that  attempt  to  force 
convergence  to  a  solution  from  any  starting  point  without  sacrificing  fast  local 
convergence. 

This  chapter  consists  of  two  parts.  In  the  first  part  we  discuss  the  globaliza¬ 
tion  strategy  for  Newton’s  method  by  considering  the  unconstrained  optimization 
problem.  In  Sections  2.1.1  and  2.1.2  we  discuss  in  some  detail  the  two  main  glo¬ 
balization  strategies.  The  second  part  is  devoted  to  study  in  detail  the  globaliza¬ 
tion  strategy  for  problem  (EQ).  A  crucial  ingredient  is  the  use  of  a  merit  func¬ 
tion.  In  Section  2.2.1  we  discuss  some  of  the  existing  merit  functions.  In  Section 
2.2.2  we  present  some  existing  methods  for  solving  problem  (EQ). 

2.1  GLOBALIZATION  STRATEGY  FOR  PROBLEM  (UCOP) 

We  start  our  discussion  of  globalizing  Newton’s  method  by  considering  the 
unconstrained  optimization  problem  or  problem  (UCOP).  In  this  section  we  dis¬ 
cuss  the  two  main  globalization  strategies:  namely,  the  line  search  strategy  and 
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the  model  trust  region  strategy. 

2.1.1)  Line  Search  Globalization  Strategy 

This  is  the  modern  version  of  the  traditional  idea  of  backtracking  along 
Newton’s  direction  if  a  full  Newton’s  step  is  unsatisfactory. 

The  idea  of  the  line  search  strategy  is  simple  and  natural.  Let  sh  be 
Newton’s  step  at  xk  .  We  take  a  step  ^ksk  >  for  some  7*  >  0,  that  makes 
xk+l  =  xk+iksk  an  acceptable  next  iterate. 

An  acceptable  step  at  least  has  to  satisfy  the  so  called  a-condition 

fixk  +  T*  sk)<f{xk)+^kaVf{xk)Tsk  , 

where  a  e  (0,1)  is  a  small  fixed  constant.  An  additional  condition  may  also  be 
required.  Different  rules  may  be  used  to  define  an  acceptable  step.  Some  of  these 
rules  were  studied  by  Armijo  (1969),  Goldstein  (1967)  and  Wolfe  (1969). 

The  convergence  theory  of  such  an  algorithm  shows  that  choosing  7*  =  1 
whenever  it  is  acceptable  will  not  affect  the  fast  local  convergence  [see  Dennis  and 
More  (1977)].  This  fact  suggested  an  algorithm  for  choosing  7*  .  The  idea  is 
simple,  we  start  with  7*  =  1,  and  then,  if  xk  +  sk  is  not  acceptable,  backtrack  by 
decreasing  7*  until  an  acceptable  xk  +7 ksk  is  found.  This  is  precisely  the  back¬ 
tracking  algorithm. 

ALGORITHM  (2.1.1):  The  Backtracking  Algorithm 
Given  a  e  (0,1)  ,  0  <  /  <  «  <  1  and  7*  =  1 


while  f(xk  +7*  sk)  >  /(**)  +7*  a  Vf{xk)  sk 


18 


do 

Ik  ■=  P  Ik  for  some  p  e[l  ,  u] 
xk+ 1  -=  xk  7k  sk  • 


For  more  details  concerning  line-search  strategies  we  refer  the  reader  to 
Dennis  and  Schnabel  (1983). 


2.1.2)  Trust  Region  Globalization  Strategy 


The  idea  of  the  trust  region  is  based  on  estimating  the  region  in  which  a  local 
model  of  the  function  /  at  xk  can  be  trusted  to  adequately  represent  the  func¬ 
tion,  and  then  taking  the  step  which  minimizes  the  model  in  this  region. 

Specifically,  we  build  a  local  model  of  / (ar^-t-s*)  at  xk  ,  say  mk{sk),  which  at 
least  satisfies  the  properties: 

mk{  0  )  =  /(  xk  )  ,  (2.1.1) 

Vm*(0)  =  Vf(xk).  (2.1.2) 

Given  such  a  model  and  a  trust  region  radius  Ak  ,  we  solve  for  sk  the  following 
optimization  problem: 


minimize  mk  (  s  ) 
subject  to  1 1  s  || 2  <  Ak  . 


If  the  model 


m*(s*) 


is  good  enough,  i.e. ,  if 


Aredk 

- L  > 

Predk 


m , 


(2.1.3) 


where  rt q  e  (0,1)  is  a  small  fixed  constant, 
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Aredk  =  f{xk)-f(  xk+sk  )  ,  (2.1.4) 

and 

Predk  =  f  (  xk  )  -  mk(  sk  )  ,  (2.1.5) 

then  we  accept  the  step  sk  and  set  xk+l  =  xk  +  sk  . 

If  the  local  model  mk(sk)  is  convex,  we  obtain: 

Vmk{  0  )Tsk  <  mk(  sk  )  -  mk(  0  )  . 

This  relation,  using  inequalities  (2.1.1)  and  (2.1.2),  can  be  written  as: 

V/(  xk  )Tsk  <  mk{  sk  )  -  /(  xk  )  .  (2.1.6) 

Using  (2.1.4)  and  (2.1.5),  we  can  rewrite  (2.1.3)  as: 

/(  xk+sk  )  <  /(**)  +  »7i  (  mk{sk)  -  fk{ 0)  )  . 

which,  because  of  (2.1.6),  can  be  viewed  as  a  relaxation  of  the  a-condition: 

/(  **+«*)  <  /(  xk  )  +  a  V/(  xk  )T  sk  (2.1.7) 

As  a  criterion  used  to  accept  or  reject  the  step  sk,  More  and  Sorensen  (1983)  use 
(2.1.3)  and  Dennis  and  Schnabel  (1983)  use  (2.1.7). 

If  the  step  sk  is  rejected,  then  we  set  xk+l  =  xk  and  decrease  the  radius  of 
the  trust  region  for  the  next  iteration  by  choosing: 

^i+l  €  II  sk  I  1 2  i  a2  I  I  sk  II 2  ]> 
where  0  <  oq  <  a2  <  1  . 

On  the  other  hand,  if  the  step  is  accepted,  we  set  xk+1  —  xk  +  sk  and  Ak  is 
updated  according  to  the  following  scheme: 


If 
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Aredk 

Predi. 


<  r)2  where  /?2  e  (  ^  ,  1  )  , 


then  the  radius  of  the  trust  region  is  updated  by  setting: 

Afc+i  =  min  [  Ak  ,  a3  1 1  sk  |  |2  ]  where  o3  >  1 


Else,  if 


Aredk 

Predt 


>  Vt 


then  we  update  A*  by  setting: 


At+i  =  max  [  At  ,  a3  ||  sk  ||2 


2.2  GLOBALIZATION  STRATEGY  FOR  PROBLEM  (EQ) 

Now,  we  consider  the  equality  constrained  optimization  problem.  In  Section 
(1.2.3)  we  saw  that  the  SQP  algorithm  is  equivalent  to  Newton’s  method  on  the 
extended  system.  So,  it  shares  the  advantages  and  the  disadvantages  of  Newton’s 
method.  From  the  good  side  of  Newton’s  method,  it  is  locally  q-quadratically 
convergent  (if  we  use  exact  second-order  information).  However,  from  the  bad 
side  of  Newton’s  method,  it  is  not  a  globally  convergent  method.  It  converges 
only  if  the  starting  point  is  close  enough  to  the  solution.  This  means  that  it  may 
not  converge  at  all  if  the  starting  point  is  far  away  from  the  solution. 

Before  we  start  our  discussion  of  the  globalization  strategy  of  methods  that 
attempt  to  solve  problem  (EQ),  we  have  to  answer  the  following  important  ques¬ 
tion: 

How  do  we  test  the  step  sk  to  see  if  it  will  make  satisfactory  progress  towards 
the  solution  of  problem  (EQ)  in  going  from  xk  to  xk+sk  ? 
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The  answer  to  this  question  is  not  easy.  It  takes  us  to  the  following  section. 

2.2.1)  Merit  Functions  For  Constrained  Optimization  Problem 

In  the  case  of  unconstrained  optimization,  it  is  sufficient  to  accept  the  step 
sk  if  /(  xk+sk  )  is  smaller  than  / (  xk  )  by  an  appropriate  amount.  However, 
for  constrained  optimization,  there  are  two  goals,  which  may  not  be  compatible; 
first,  to  reduce  the  objective  function  /  (  x  ),  and,  second,  to  go  toward  feasibility. 

Of  course,  the  real  problem  is  to  identify  an  appropriate  merit  function  <E>  . 
This  function  should  connect  /  and  h  in  such  a  way  that  progress  in  the  merit 
function  means  progress  in  solving  the  problem.  There  should  be  a  connection 
between  the  merit  function  and  the  way  the  step  is  computed  in  the  sense  that  the 
step  s  generated  by  the  subproblem  should  give  a  decrease  in  the  merit  function. 
This  decrease  should  be  sufficient  to  lead  to  the  solution  of  problem  (EQ).  It  is 
preferred  that  $  be  smooth,  free  of  arbitrary  parameters,  and  inexpensive  to 
evaluate.  On  the  other  hand,  the  merit  function  <f>  should  not  disrupt  the  rapid 
rate  of  convergence  of  the  basic  method  in  a  neighborhood  of  the  solution. 

We  should  accept  the  fact  that  no  ideal  merit  function  with  all  desirable  pro¬ 
perties  yet  exists.  Some  properties  may  only  be  obtained  at  the  expense  of  others. 

Although  many  merit  functions  have  been  suggested,  they  usually  suffer  from 
either  the  fact  that  they  involve  parameters  for  which  there  is  no  clear  choice,  or 
they  are  not  compatible  with  the  subproblem  from  which  the  step  is  computed. 

Now,  let  us  consider  some  merit  functions  that  have  been  suggested  to  force 
global  convergence. 

First,  consider  a  class  of  merit  functions  that  have  the  following  form: 
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$(*)  =  f(x)  +  W(h(x)),  (2.2.1) 

where  W(  h(x )  )  is  nonnegative  for  ail  h  e  Rm  and  satisfies  0  )  =  0  .  Spe¬ 
cial  cases  of  this  function  are: 

The  least  squares  penalty  function, 

$(x)  =  f(x)  +  r  1 1  h(x)  1 1|  (2.2.2) 

is  used  by  Bartholomew-Biggs  (1982).  Celis,  Dennis,  and  Tapia  (1987)  used  this 
function  as  a  relaxed  merit  function. 

Han  (1977b)  used  the  following  penalty  function: 

$(*)=/(*)  +  r  1 1  h(x)  1 1  j_.  (2.2.3) 

Many  algorithms  have  employed  such  a  function  as  a  merit  function  [for  example, 
see  Powell  (1978),  Coleman  and  Conn  (1982)  and  Byrd,  Schnabel,  and  Shultz 
(1985)]. 

The  function: 

$(x)  =  /( x)  +r  ||Mz)||2  (2.2.4) 

is  used  as  a  merit  function  by  Byrd,  Omojokun,  Schnabel,  and  Shultz  (1987). 

This  class  of  merit  functions  has  a  very  useful  property  that  if  r  is  any 
number  satisfying  r  >  1 1  k*  I  loo  ?  then  4>  has  a  local  minimum  at  x *  . 

The  merit  function  of  the  form  (2.2.2)  is  differentiable.  However,  (2.2.3)  and 
(2.2.4)  are  not  differentiable.  A  disadvantage  of  using  a  nondifferentiable  merit 
functions  is  that  it  needs  special  methods  to  deal  with  the  nondifferentiability  and 
we  lose  the  advantage  of  widely  used,  well  developed  algorithms  that  require 
differentiability. 

All  merit  functions  of  the  form  (2.2.1)  suffer  from  the  Maratos  effect  [Maratos 
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(1978)]  which  means  that: 

*  (  xk  +  «  )  >  *(**)•  (2.2.5) 

is  possible  even  when  the  trial  step  s  makes  great  progress  towards  the  solution. 
The  following  example  by  Maratos  (1978)  explains  this. 


Example: 

Consider  the  following  problem: 


minimize  /  (x)  =  —  xx  +  2  (  x%  +  x22  —  1  )  , 
subject  to  xx  +  —  1  =  0  . 

The  solution  is  x*  =  (l,0)r  . 

The  Hessian  of  the  Lagrangian  at  the  solution  is  the  unit  matrix. 
Now,  if  xk  is  the  point 


**  = 


for  some  angle  0  ,  then  the  SQP  method  using  the  unit  matrix  as  an  approxima¬ 
tion  to  the  Hessian  at  xk  ,  gives  the  following  search  direction: 


s  _  sin2# 

—  sin#cos#  ’ 

and  we  get 


1 1  xk  ~  x*  I II  =  2  (  1  —  cos  9  )  , 


I  I  xk  +S  —  x*  I  1 2  =  (  1  —  COS  0  )2  . 

However, 


/(  xk+s  )  >  fixk) 


fi(  xk+s  )  >  h{xk  ). 
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So,  the  trial  step  s  increases  all  merit  functions  of  the  form  (2.2.1),  even  though 
it  has  the  quadratic  rate  of  convergence  because 

II  *k+s  -  2*  ll2  =  j  II  xk  -x*  HI  . 

This  example  shows  that  an  algorithm  that  attempts  to  globalize  the  SQP  method 
and  employs  a  merit  function  of  the  form  (2.2.1)  may  reject  steps  similar  to  the 
step  s  in  the  last  example.  Consequently,  the  fast  local  rate  of  convergence  will 
be  disrupted. 

Another  disadvantage  of  using  a  function  of  the  form  (2.2.1)  as  a  merit  func¬ 
tion  is  shown  in  the  following  example  [Byrd,  Schnabel,  and  Shultz  (1985)]. 

Consider 

minimize  f  (x)  =  2  +  —  x$ 

2 

subject  to  x?  +  x$  =  1  . 

The  only  local  minimizer  is  at  x*  =  (-1  ,  0)r  but  there  is  a  Kuhn-Tucker  point 
at  x ,*  =  (1  ,  0)T  with  Lagrange  multiplier  X»*  =  — 1  . 

The  Hessian  of  the  Lagrangian  at  that  point  is 

V|/(a;**,X**)  =  |  q2 

and  h(  x„  )  =  0  . 

Assume  that  the  algorithm  is  of  trust  region  SQP  type  (see  Section  (2.2.2)  for  the 
definition  of  this  algorithm).  Let  xk  =  (1  ,  0  )T  ,  then  the  algorithm  will  take  a 
step  of  the  form  A  s  where  A  is  the  radius  of  the  trust  region  and  s  =  (0  ,  l)r 
is  in  the  direction  of  the  negative  gradient.  However,  a  step  of  any  length  along 
the  direction  5  will  increase  both  the  objective  function  and  the  absolute  value  of 
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the  constraints.  Therefore,  any  algorithm  of  that  kind  that  passes  by  this  point 
will  never  leave  it  even  though  it  is  a  maximizer. 

To  avoid  the  Maratos  effect,  some  techniques  have  been  suggested.  The  first, 
the  watchdog  technique,  is  to  relax  condition  (2.2.5)  at  some  iterations  [see  Cham¬ 
berlain,  Lemarechal,  Pedersen  and  Powell  (1982)],  or  to  add  to  the  step  what  is 
called  the  second  order  correction  [see  for  example  Coleman  and  Conn  (1982), 
Fletcher  (1982),  (1984),  Mayne  and  Polak  (1982),  Byrd,  Schnabel,  and  Shultz 
(1985)]. 

Adding  the  second  order  correction  also  takes  care  of  the  disadvantage  that 
was  described  in  the  last  example.  However,  it  adds  more  expense  to  the  trial 
step. 

Some  other  useful  merit  functions  have  the  following  general  form: 

*  (*,X)  =  /(*)  +  \Th{x)  +  W{  h(x) )  ,  (2.2.6) 

where  If  is  a  continuously  differentiable  function  that  satisfies  W{  h(x)  )  is 
nonnegative  for  all  h  e  Rm  and  1T(0)  =  0  . 

One  of  the  advantages  of  using  a  merit  function  of  this  class  is  that  it  avoids 
the  Maratos  effect  that  might  happen  if  we  employed  one  of  the  form  (2.2.1). 

One  of  the  most  natural  and  useful  merit  functions  was  suggested  by 
Hestenes  (1969).  It  is  the  augmented  Lagrangian  function: 

$  (x,\;r)  =  /(*)  +  XrA(x)  +  r  1 1  A  (a:)  1 1|  (2.2.7) 

where  X  e  R  m  . 

It  is  well  known  that 

$  (ar.X.jr)  =/(*)  +  \?h{x)  +  r  \  \  h{x)  |  || 
has  a  local  minimum  at  x *  when  r  is  sufficiently  large,  where  X*  is  the 
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Lagrange  multiplier  at  the  solution.  Since  X,  is  not  known  except  at  the  solu¬ 
tion  x *  ,  an  update  formula  for  X  must  be  used  to  approximate  X»  during  the 
minimization  calculation.  In  Section  (1.2.2)  we  presented  some  update  formulas 
that  have  been  suggested.  We  recall  three  of  them 

i)  The  projection  formula: 

^*+1  =  —  ^t+i)  W+iV/^i  .  (2.2.8) 

ii)  Miele’s  formula: 

=  (^i+i  ^/t+i)  X(  hk+1  —  Vh)F+1  V/t+1  )  .  (2.2.9) 

iii)  Tapia’s  update  formula: 

**+i  =  (  V/^VVA*  )"1[  hk  -  VAtrV,VV/t]  .  (2.2.10) 

The  last  formula  is  equivalent  to 

W  =  -  (  VA/VA,  )-'VA/(  VA  +  V|/,  )  , 

where  sk  is  the  SQP  step. 

Fletcher  in  (1973)  proposed  the  differentiable  exact  penalty  function 

*(*;0-/(*)  +  M*)T*(*)  +  r  ||  h(x)  |||, 

where  X(x)  =  —  (VA(x)r VA(x))  1Vh(x)T'Vf  (x)  .  It  is  also  used  as  a  merit  func¬ 
tion,  with  (2.2.8)  to  estimate  the  value  of  the  multiplier,  by  Powell  and  Yuan 
(1986).  This  function  has  the  advantage  that  when  the  second  order  sufficiency 
conditions  are  assumed  and  r  is  sufficiently  large,  then  the  minimizer  of 
Fletcher’s  exact  penalty  function  is  a  solution  to  problem  (EQ).  However,  this 
function  is  expensive  to  evaluate. 

Another  interesting  merit  function  is  proposed  by  Di  Pillo  and  Grippo  (1979). 
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It  has  the  form: 

4>(x,\;r)  =  /(x)  +  \Th(x)  +  r  1 1  h  1 1|  +  ||  M(x)(Vf  +  Vh\)  1 1|  (2.2.11) 

where  M(ar)  is  a  full  rank  matrix  of  order  m  X  n  or  n  X  n  . 

This  function  does  not  belong  to  the  class  of  functions  of  the  form  (2.2.6).  How¬ 
ever,  it  is  appropriate  to  mention  it  here. 

If  M(x)Vh(x)  is  an  mXm  nonsingular  matrix  for  all  a:,  then,  under  some 
regularity  and  continuity  assumptions,  it  can  be  shown  that  for  sufficiently  large 
r,  all  local  minimizers  of  (2.2.11)  are  solutions  of  the  problem  (EQ).  [See  Bertse- 
kas  (1982)] 

If  we  choose 

M{x)  =  {Vh(x)TVh{x))~lVh{x)T 

then  M(x)  Vh(x)  =  /.  Thus  the  local  minimizer  of  (2.2.11)  and  Fletcher’s  exact 
penalty  function  are  identical  if  r  is  replaced  by  r— —  .  So,  we  can  regard  Di 

4 

Pillo  and  Grippo’s  merit  function  as  a  generalization  of  Fletcher’s  exact  penalty 
function. 

Boggs  and  Tolle  (1984)  use  the  following  exact  penalty  function: 

*(*)  =  f(x)  +  \(x)Th(x)  +  r  |  |  A%h{x)  1 1| 

where  A  (x)  =  (Vh(x)r  Vh(x))-1  and  X(x)  =  -  (Vh(x)T Vh(x))~1Vh(x)T  Vf  (x)  . 

It  is  quite  interesting  to  notice  that  Boggs  and  Tolle’s  exact  penalty  function  is 
equivalent  to  the  Lagrangian  function  (1.1.1)  when  \( x )  is  given  by  the  following 
relaxed  Miele’s  update: 

Xr(z)  =  (V/i^V/^))-1!  r  h(x)  -  Vh(x)TV/( x)  )  . 

In  that  sense  we  can  say  that  l(x,\r(x))  is  an  exact  penalty  function. 
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Boggs  and  Toile’s  function,  Fletcher’s  exact  penalty  function,  and  Di  Pillo 
and  Grippo’s  function  share  the  disadvantage  that  they  contain  first  derivatives, 
so  their  second  derivatives  will  be  either  impossible  or  very  expensive  to  evaluate. 

Schittkowski  (1983),  Gill,  Murray,  Sunders,  and  Wright  (1986),  use  as  a 
merit  function,  the  augmented  Lagrangian  (2.2.7)  in  which  the  Lagrange  multi¬ 
plier  is  treated  as  a  separate  variable. 

Schittkowski  (1983),  Gill,  Murray,  Saunders,  and  Wright  (1986)  use  the  following 
scheme  to  update  the  Lagrange  multiplier: 

\+  =  afi  +  (l-a)\c  a  e  (0,1) 

where  fi  =  \Qp  and  starting  with  \  =  nx  . 

Celis,  Dennis  and  Tapia  (1984)  used  the  augmented  Lagrangian  as  a  merit 
function.  They  fix  the  multiplier  during  the  process  of  testing  the  step  and 
update  it  after  accepting  the  step. 

Celis,  Dennis,  and  Tapia  (1987)  used  the  augmented  Lagrangian  as  a  primary 
merit  function  with  the  function  (2.2.2)  as  an  auxiliary  merit  function. 

In  this  research  we  will  use  the  augmented  Lagrangian  as  a  merit  function  in 
which  the  Lagrange  multiplier  is  treated  as  a  separate  variable.  We  will  use  the 
following  formula  to  update  the  estimate  of  the  Lagrange  multiplier 

Vi  =  -  (  Vh?Vhk  rlVh?{  Vfk  +Bksk), 
where  sk  in  the  formula  is  the  trial  step. 

2.2.2)  Some  Existing  Methods 


Problem  (EQ)  is  often  solved  by  the  Successive  Quadratic  Programming 
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(SQP)  algorithm  (see  Section  (1.2.3)).  Namely,  at  the  iteration  the  step  is 
computed  by  solving  the  following  quadratic  programming  subproblem: 

minimize  Vxl(xk,\k)Ts  +  -1  sT Bk  s  (QP) 

subject  to  h(xk )  +  Vh(xk)T s  —  0  , 

where  Bk  is  the  Hessian  of  the  Lagrangian  or  an  approximation  to  it. 

The  local  convergence  analysis  for  the  SQP  algorithm  has  been  fairly  well 
established.  The  area  of  global  convergence  is  currently  receiving  much  attention. 

Many  publications  have  considered  globally  convergent  algorithms,  via  merit 
functions  and  line  searchs.  [for  example  see  Han  (1977b),  Fletcher  (1981), 
Bartholomew-Biggs  (1982),  Schittkowski  (1983),  Powell  and  Yuan  (1984),  Burke 
and  Han  (1985),  Boggs  and  Tolle  (1986),  Gilbert  (1986)  and  Gill,  Murray, 
Saunders  and  Wright  (1986)]. 

Schittkowski  (1983)  and  Gill,  Murray,  Saunders,  and  Wright  (1986)  solve  the 
QP  subproblem  to  get  sQP  and  \QP  .  A  steplength  parameter  ak  is  obtained 
by  using  a  line  search  globalization  strategy  with  the  augmented  Lagrangian  as  a 
merit  function,  then  the  new  iterate  is  defined  to  be 

**+1  =  *k  +«*  sQP 

^k+i  =  ^k  +  ak  {  )  • 

The  idea  behind  this  approach  is  that,  since  the  variable  x  is  controlled  by  the 
line  search  globalization  strategy,  the  variable  X  has  to  be  controlled  by  a  form 
of  line  search.  This  idea  explains  why  they  use  for  computing  \k+1  a  convex 
combination  of  \k  and  \QP  . 

Trust  region  approaches  for  unconstrained  optimization  have  proven  to  be 
very  successful  both  theoretically  and  practically.  The  most  natural  way  to  intro- 
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duce  the  trust  region  idea  to  constrained  optimization  is  to  add  a  constraint  which 
restricts  the  size  of  the  step  in  problem  (QP).  That  is,  at  the  aA  iteration  we 
solve  the  following  trust  region  quadratic  programming  subproblem: 


minimize  Vxl(xk,\k)T  s  4-  —  sT Bk  s 

2 

subject  to  h(xk)  +  Vh(xk)T s  =  0 

1 1  s  I  I2  5:  At 


(TRQP) 


However,  this  approach  may  lead  to  inconsistent  constraints  because  the  hyper¬ 
plane  h{xk)  +  Vh(xk)T s  =0  may  not  intersect  the  sphere  1 1  s  |  |2  <  A*  .  Even 
if  they  intersect,  there  is  no  guarantee  that  the  trial  step  s  will  sufficiently 
decrease  $  and  be  accepted.  So  we  may  need  to  decrease  the  radius  of  the  trust 
region,  and  again  we  may  get  inconsistent  constraints  if  Ak  becomes  too  small. 
Consequently,  there  will  be  no  feasible  region  that  satisfies  both  constraints,  and 
the  model  subproblem  will  not  have  a  solution  in  the  trust  region. 

It  is  easy  to  overcome  this  difficulty  if  the  constraints  are  linear  (i.e.  for  gen¬ 
eral  linear  equality  and  inequality  constrained  optimization  problem).  To  do  this 
simply  maintain  feasibility  at  each  iteration  by  either  projecting  or  restoring  the 
step  to  the  feasible  region.  This  can  be  done  efficiently  for  linearly  constrained 
optimization  problems.  If  we  do  that  at  the  k  ^  iteration  the  step  will  be  com¬ 
puted  by  solving  the  following  subproblem: 

minimize  Vxl(xk,\k)T s  +  A  sT Bk  s 

2 

subject  to  Vh(xk)Ts  =0 
1 1  s  1 12  <  At  > 

which  has  always  consistent  constraints.  [See  Gay  (1983)] 

For  nonlinear  constraints,  to  overcome  this  difficulty,  two  main  approaches 
have  been  introduced.  The  first  approach  is  to  relax  the  constraints  by 


31 


considering  the  following  subproblem: 

minimize  Vxl(xk,\k)T  s  +  —  sT Bk  s 

2 

subject  to  a  h(xk)  +  Vh(xk)T s  =  0 
1  i  s  1 12  <  Ak 

where  0  <  a  <  1  .  This  approach  has  been  applied  by  Vardi  (1985)  and  Byrd, 
Schnabel,  and  Shultz  (1985). 

Using  this  approach  makes  the  problem  always  feasible  in  the  sense  that  if  we  set 
a  =  0  then  the  hyperplane  a  h(xk)  +  Vh(xk)Ts  =  0  will  contain  the  current 
point  and  consequently  it  will  intersect  with  a  trust  region  sphere  of  any  radius. 
However,  this  approach  suffers  from  a  disadvantage  that  the  step  depends  on  the 
unknown  parameter  a  which  there  is  no  clear  way  of  choosing. 

An  interesting  way  using  this  approach  to  compute  a  trial  step  that  does  not 
depend  on  the  parameter  a  was  implemented  by  Byrd,  Omojokun,  Schnabel,  and 
Shultz  (1987).  They  calculate  s  by  solving  the  following  subproblem 

minimize  V/  (xk)T s  +  s T Bk  s 

subject  to  Vh(xk)T s  =  Vh( xk)T v 
1 1  s  |  1 2  <  Ak  , 

where  v  solves  the  following  problem 

minimize  |  |  h(xk)  +  Vh( xk)T  v  \  |2 
subject  to  1 1  v  |  |2  <  £  A*  , 

where  0  <  f  <  1  . 

The  second  approach  is  to  add  the  trust  region  constraint  to  a  somewhat 
different  problem.  At  the  k ^  iteration  the  step  is  taken  to  be  the  one  which 
minimizes  the  quadratic  model  of  the  Lagrangian  and  gives  some  decrease  in 
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II  hk  +  ^hks  ||2.  This  idea  was  first  introduced  by  Celis,  Dennis,  and  Tapia 
(1984).  At  each  iteration  the  step  is  computed  by  solving  the  following  subprob¬ 
lem: 


minimize  Vxl(xk,\k)T s  +  —  sT Bk  s 

2 

subject  to  ||  -h  VA(a:A.)7’s  ||2  <  9k 

1 1  s  I  I2  5:  \ 


(CDT) 


where  0k  is  some  positive  constant  that  depends  on  k  . 

Celis,  Dennis  and  Tapia  (1984)  chose  6k  to  be  1 1  hk  +  Vhksj;p  1 12  ,  where 
sgp  =  01^  Vhk  hk  is  the  step  to  the  Cauchy  point,  i.e. ,  the  minimizer  in  the 

trust  region  {s  :  ||  s  ||2<At}  of  ||  h(xk)  +  Vh(xk)T  s  [  |2  along  its  negative 
gradient.  That  is,  the  Celis-Dennis-Tapia  step  is  chosen  from  the  set  of  steps 
from  xk  that  are  inside  the  trust  region  and  give  at  least  as  much  descent  on  the 
2-norm  of  the  residual  of  the  linearized  constraints  as  the  Cauchy  step. 

In  1986  Powell  and  Yuan  introduced  a  different  way  of  choosing  0k.  They 
chose  it  to  be  any  number  that  satisfies 

h  =  min  [  ||  h(xk)  +  Vh{xk)Ts  1 12  :  1 1  s  \  |2  <  aAk,  0  <  a  <  1  ]. 

For  any  choice  of  0k  ,  if  5  solves  the  CDT  subproblem,  then 

{Bk  +  (i  I  +  a  Vhk  Vhk  )  s  =  —(yxlk+a  Vhk  hk  )  ,  (2.2.12) 

I  I  5  1 1 2  ^  Ak  , 

»{Ak~  II  «  ll2)  =  0, 

II  hM  +  Vh{xk)Ts  ||2  <  9k  , 


a  (  h  ~  II  h(xk)  +  Vh(xk)Ts  1 12 )  =  0  , 
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with  n  ,  a  >  0  . 

The  approach  of  Fletcher  (1984)  is  different.  This  approach  uses  an  exact 
penalty  function  with  a  trust  region  constraint.  Let  the  linearized  constraints  be 
l(s )  and  the  quadratic  model  of  the  Lagrangian  be  q(s)  ,  then  the  lx  exact 
penalty  function  is  formed  as  follows 

m 

=  q{s)  +  £  m  ids) 

i  =  1 

At  each  iteration  the  step  is  computed  by  minimizing  this  /j  exact  penalty  func¬ 
tion  subject  to  a  trust  region  constraint. 
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CHAPTER  THREE 

THE  TRUST  REGION  ALGORITHM 

This  chapter  is  devoted  to  presenting  in  detail  a  variant  of  the  1984  Celis- 
Dennis-Tapia  trust  region  algorithm  for  equality  constrained  optimization  prob¬ 
lem.  Before  we  start  our  discussion  about  the  algorithm,  let  us  introduce  some  of 
the  notation  that  will  be  used  in  the  rest  of  this  thesis. 

Notation 

The  trial  step  at  the  k^  iteration  is  denoted  by  $k  and  its  associated 
Lagrange  multiplier  by  AX*  .  If  the  step  is  accepted  it  will  be  denoted  by  sk 
and  its  associated  Lagrange  multiplier  by  AX*  . 

The  terms  V2h{xk)  AX  and  V2h(xk)  h(xk)  will  appear  in  Chapter  4  and  5. 

m  m 

They  are  used  to  denote  £  V2/^**)  AX,-  and  £  V2/*,-^*)  hi(xk)  respectively. 

*  -  1  i  -  1 

The  matrix  Bk  denotes  V2/(x*,X*)  or  an  approximation  to  it. 

3.1  DESCRIPTION  OF  THE  ALGORITHM 

The  algorithm  is  iterative.  At  each  iteration  a  trial  step  s*  is  obtained  by 
solving  a  model  problem. 

At  any  iteration  indexed  k  ,  we  try  to  update  the  estimate  of  the  solution 
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xk  to  be  the  improved  estimate  xk+l  .  To  do  this,  the  step  s$p  is  computed  by 
solving  the  QP  subproblem.  If  it  exists  and  lies  inside  the  trust  region,  i.e.  if 
1 1  S$P  1 1  <  At  ,  then  we  set  s,  —  s$p .  Otherwise,  the  CDT  subproblem  will  be 
solved.  On  the  other  hand,  if  xk  is  feasible,  then  we  solve  the  TRQP  subprob¬ 
lem.  This  can  be  stated  as  an  algorithm  as  follows 

ALGORITHM  (3.1.1)  Computing  the  Trial  Step 

Solve  (QP)  to  get  $$p  and  A\fip 

If  ||  ^p||2  <  Ak 
then  sk  =  sfip 

AX,  =  A\fip  . 

Else,  if  xk  is  feasible 
then  solve  (TRQP) 

Set  s,  =  sP!iQP 

At  =  -  (  )-'VA/i  V,lk  +  B„srQP }  . 

Else,  solve  (CDT) 

Set  s,  =  skDT  . 

At  -  -  (  Vh?Vht  r'VAfl  V,4  +  B„, kCBT )  . 

Let  s,  be  the  step  computed  by  the  algorithm  and  AX,  be  the  correspond¬ 
ing  Lagrange  multiplier  step,  we  test  whether  the  point  (  xk+sk  ,  X,+AX,  )  is  a 
better  approximation  to  the  solution  (  z*  ,  X*  ).  In  order  to  do  this,  we  use,  as  a 
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merit  function,  the  augmented  Lagrangian  (2.2.7). 

Now,  we  test  (  xk+sk  ,  X^-fAX*  )  to  determine  whether  it  makes  an  improve¬ 
ment  in  the  merit  function. 

We  define  the  actual  reduction  in  the  merit  function  in  going  from  (a^X^)  to 
(  xk+h  .  X*+AX*  )  by: 

Aredk  =  L(xk,\k;rk)  —  L(xk+sk,\k+AXk;rk) 

=  l(xk>^k)  ~  /(z*+4A;fc+AXjfc)  +  rk  [  ||  h(xk)  |  ||-  1 1  h(xk+sk)  |  ||  ]. 
Which  also  can  be  written  as: 

Aredk  =  l{xk,\k)  -  l(xk+sk,\k)  -  AX/  h{xk+sk) 

+  rk[  I  UK)  III  -  lUK+s*)  III]-  (3.1.1) 

The  step  sk  calculation  is  based  on  a  quadratic  approximation  of  the  Lagrangian 
function  and  a  linear  approximation  to  the  constraints.  Now  by  using  the  same 
approximation  we  can  compute  the  predicted  reduction  which  is  defined  by 

Predk  L  (xk  ,\k ,rkj  ^{xk ,sk  \k ,S\k',rk), 

where  ^(xk,sk,\k,S\k;rk)  is  an  approximation  to  L(xk+sk,\k+A\k;rk)  and  is 
defined  by: 

^(xk,h,^k,^kJk)  =  *KK)  +  VI/(xi,Xi)T«i  +  j^skTBksk 
+  A \?[h(xt)+Vh(xk)Tsk  } 

+  rk  ||  h{xk)  +  Vh{xk)Tsk  |||. 


Hence 


Predk  —  L(xk,\k;rk)  —  [  l(xk,\k)  +  V/  (xk,\k)T 4+y  skBk  sk  } 
-  AX/  (  A (*)  +  Vh(Xl)Tst  ) 
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-  rk  II  h(xk)  +  Vh(xk)Tsk  1 1|  ; 
which  can  be  written  as: 

Predk  =  -  Vxl(xk,\k)Tsk  -  1  skrBk  sk  -  AX/(  A(zt)+VM**)rS*  ) 

+  r-t  [  II  hixk )  III  -  II  M**)  +  V/i(a:jfc)7'4  |||].  (3.1.2) 


We  accept  the  step  and  set  xk+1  =  za+s*  and  X*+1  =  X^+AX*  ,  if 


>  Vi 


where  rfk  e  (0,1)  is  a  small  fixed  constant. 


If  the  step  is  rejected,  then  we  set  xk+1  =  xk  and  \k+l  =  \k  and  decrease 
the  radius  of  the  trust  region  by  setting 

^*+i  €  1 1  h  1 1 2  7  a2  1 1  h  1 1 2  ]  > 

where  0  <  a1  <  a2  <  1  .  [See  Dennis  and  Schnabel  (1983)]. 

When  the  step  is  accepted,  the  trust  region  radius  is  updated  by  comparing 
the  value  of  Aredk  with  Predk  .  Namely,  if 

.  Aredk 

%  2  TwT  <  "2 

where  rj2  e  (»?x,l)  ,  then  the  radius  of  the  trust  region  is  updated  by  the  rule: 


+ 1  =  min  [  A, 


k  >a  3  II  sk  I  1 2 


where  o3  >  1  . 


Aredk 

However,  ll  >  n2  ,  then  we  increase  the  radius  of  the  trust  region  by  set¬ 


ting: 
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A*+i  =  max  [  Ak  ,  az  ||  sk  ||2  ]  . 
This  can  be  stated  as  an  algorithm  as  follows: 


ALGORITHM  (3.1.2)  Testing  the  Step  and  Updating  the  Trust  Region 
Radius 


If 


Aredk 

Predk 

then  set 


<  hi  , 

xk+ 1  =  xki 
^k+l  =  ^ k  > 

At+i €  1 1  h  1 12 ,  II  h 


Else,  if  < 


Aredk 

Predk 


<  b2 


then  set  xk+l  =  xk  +  sk  , 


^k+ 1  —  ^k  +  AX*  , 

Afc+i  =  min  [A*  ,  o3  1 1  sk  1 12  ]  . 


(3.1.3) 


E1S6’  lf  Predk  ~  12  an<1  1 1  sQP  1 12  >  A*  and  a4  I  I  *k  I  l2  >  At 

then  we  do  only  one  internal  doubling  according  to  algorithm 
(3.1.3)  below. 


Else,  set  %k+i  %k  d-  sk , 

^A+l  =  ^k  +  A\ k  > 

Afc+i  =  max[  A*  ,  a3  ||  sk  1 12  ]. 
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In  the  case  when 


Aredk 

Predi. 


>  r)2  ,  sk  ^  sQP  ,  and  cv4  1 1  sk  |  |2  >  Ak  ,  where 


«4  >  1  >  then  we  do  only  one  internal  doubling  by  setting  Ak  :=  a4  1 1  sk  |  |2  and 
if  II  ^  ll2  <  Ak  ,  we  take  it  as  our  trial  step.  Otherwise,  we  stay  with  the  old 
acceptable  step  and  update  the  old  trust  region  radius  by  the  rule 

At+i  =  max  [  A*  ,  a3  1 1  sk  1 12  ]  . 

This  can  be  stated  as  an  algorithm  as  follows 


ALGORITHM  (3.1.3)  Internal  Doubling 


Set  A*  =  a4  ||  sk  ||2  . 

If  ||  ^p|l2>  A*  , 

then,  go  back  to  the  last  acceptable  step  and  the  last  corresponding 
trust  region  radius  and  update  it  by  step  (4)  of  algorithm  (3.1.2). 


Aredk 

E'Se’,f 

then  go  back  to  the  last  acceptable  step  and  the  last  corresponding 
trust  region  radius  and  update  it  by  step  (4)  of  algorithm  (3.1.2). 


Else,  accept  the  step  and  update  A*  according  to  step  (l)  or 
(4)  of  algorithm  (3.1.2)  above. 


3.2  THE  ALGORITHM 

The  outline  of  the  algorithm  is  given  below.  It  differs  from  the  1984  Celis- 
Dennis-Tapia  algorithm  in  its  way  of  updating  the  penalty  parameter  in  step  (3) 
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of  the  algorithm  and  in  its  way  of  updating  of  the  trust  region  radius  in  step  (4). 

Step  (0) 

Set  x0  e  Rn  ,  B0  e  RnXn  ,  \0eRm  , 

r_j  =  1  ,  p  >  0  , 

0  <  <  a2  <  1  <  cn4  <  a3, 

0  <  tJi  <  V2  <  1  , 
e  >  0  ,  Aq  >  0  , 
and  k  —  0  . 

Step  (1) 

1 1  Ph'Vfk  1 12  +  1 1  h  1 12  <  e  ,  stop. 

Step  (2) 

Compute  sk  and  AX*  according  to  algorithm  (3.1.1)  above. 

Step  (3) 

Update  the  penalty  parameter  by  the  following  scheme: 

Set  rk  =  r*_j 

If 

Pred„  >  ■£[  II  ft.  Ill  -  ||  hi+VhtTs„  HU 

go  to  step  (4) 
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Else,  set 

0  v*4T h  +  1/2  s?Bkh  +  A^k{hk  +  Vh[sk) 

r‘  *  II*.  Ill- lUi+VftAII!  +"' 

Step  (4) 

Test  the  step  and  update  Ak  according  to  algorithm  (3.1.2)  above. 
Step  (5) 

Set  k  :=  k  +  1  and  go  to  step  (l). 
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CHAPTER  FOUR 

GLOBAL  CONVERGENCE  ANALYSIS 


This  chapter  is  devoted  to  the  analysis  of  the  global  behavior  of  our  algo¬ 
rithm.  Our  global  convergence  theory  is  sufficiently  general  that  it  holds  for  any 
algorithm  that  generates  steps  that  give  at  least  a  fraction  of  Cauchy  decrease  in 
the  quadratic  model  of  the  constraints. 

In  the  first  part  of  this  chapter  we  state  the  standard  assumptions  under 
which  the  global  convergence  theory  is  proven.  In  the  rest  of  the  chapter  we 
address  the  global  convergence  theory  of  the  algorithm.  In  Section  4.2  we  prove 
lemmas  that  deal  with  the  predicted  decrease  of  the  function  and  of  the  model.  In 
Section  4.3  we  prove  lemmas  that  are  needed  to  study  the  behavior  of  the  penalty 
parameter.  Section  4.4  is  devoted  to  studying  the  global  convergence  analysis  of 
the  algorithm. 

4.1  THE  STANDARD  ASSUMPTIONS 

It  is  clear  that  the  behavior  of  our  algorithm  will  depend  on  the  conditions 
we  impose  on  the  problem  and  on  the  matrices.  We  first  state  our  assumptions 

1)  There  exists  an  open  convex  set  Cl  e  Rn  such  that,  for  all  k,  xk  and 

+  h  e  O  . 

2)  /  and  h{  e  <72(Q)  . 
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3)  There  exists  a  positive  constant  A,  such  that,  for  all  k,  Ak  <  A,  . 

4)  Vh(x)  has  full  column  rank  for  all  x  e  Cl . 

5)  / (x)  ,  h(x)  ,  Vh(x)  ,  V/(ar),  V2/( x ),  (Vh{x)T  Vh{x))~1  and  each 

V^h^x)  ,  for  are  all  uniformly  bounded  in  norm  in  fi . 

6)  The  matrices  {  Bk  ,  k=  1,2,...}  have  a  uniform  upper  bound. 

Remark 

Assumption  (3)  implies  that  all  the  trial  steps  are  bounded.  This  assumption 
is  not  a  restrictive  assumption.  In  fact,  in  our  convergence  theory  we  never  state 
that  the  radius  of  the  trust  region  has  to  be  increased.  So  we  can  set  an  upper 
bound  on  the  radius  of  the  trust  region  inside  the  algorithm  and  our  global  con¬ 
vergence  theory  holds. 

4.2  SUFFICIENT  DECREASE  IN  THE  MODEL 

All  results  in  this  section  deal  with  the  reduction  of  the  merit  function  and 
the  predicted  reduction  of  the  model. 

In  the  following  lemma  we  use  the  fact  that  the  step  sk  is  chosen  to  give  at 
least  as  much  decrease  in  the  linearization  of  the  constraints  as  the  Cauchy  step 


Lemma  (4.1) 


Let  sk  be  the  step  generated  by  the  algorithm.  Then  there  exist  constants  bx 
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and  b0  such  that  for  all  k 


1 1  hk  1 12"  —  1 1  hh  +  V/i/s*  1 ||  > 


II  1 12 


II  hk  1 1 2 


H  > 


Proof 


From  the  way  of  computing  the  step  sk  ,  we  have 

1 1  h  ||22  —  ||  hk+Vhk  sk  1 122  >  |  |  hk  1 ||  —  0k 


—  I  I  hk  |  ||  —  I  I  hk+Vhkskp  1 122 


=  -  2  h?  VhkTs£»  -  (s£p)T  Vhk  VhkT  s? 


From  the  definition  of  skv  ,  we  have 


4P  =  ~  <*k  v/h  h  , 


where  ak  is  defined  by 


a*  = 


•  t  I  I  ^hk  h  I  I23  \  A  ,AC-i\ 

if  .  .  „ - rr^r  >  Ak  ,  (4.2.1-a) 


I  VA*  hk  ||2  ||  VA*rVA*  hk  HI 


otherwise, 


«*  = 


II  1 1# 

II  VA*rVA*  A*  Ml 


(4.2.1-b) 


Consider  the  first  case.  *.e. ,  the  case  when  skp  =  —  Ak  , ,  , k  , ,  .  In  this 


1 1  VAj.  hk  1 12 


case,  using 


1 1  Vfyfc  At  III 

1 1  VhkT  Vhk  hk  Ml 


>  Ak  ,  we  have 


II  **111-11  **+V*/St  III  >  2  A*  II  Vfc*  A*  iu-4?  1 1  h ] !i 

II  VA*  hk  III 


>  2  A*  1 1  VA*  A 


*  1 1 2  ^ k 


A,  |  |  Vh,  h 


k  nk  I  1 2 
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A k  I  I  hjc  I  1 2 


(4.2.2) 


Now,  consider  the  second  case.  We  have 


II  hk  |||  —  II  hk+'Vhk'sk  HI  >  2 


1 1  V/h  hk  I II 

1 1  VA*r  V/ijt  At  III 

II  h  III 

I  VA/  VAt  hk  1 1| 


1 1  VA*  hk  1 1; 


|  VA/VA*  A*  II; 


Hence, 


II  1 12“  -  II  A  k+Vh£sk  III  > 


VAt  A*  1 1| 


I  I  VA/  VAt  At  1 1| 


> 


|  Vhk  hk  | 

1 1  VAt  \7hk  |  2 


(4.2.3) 


From  (4.2.2)  and  (4.2.3),  we  can  write 

I  I  A*  |  ||  —  ||  hk+Vhksk  |  ||  >  ||  VAt  hk  |  |2  min  [  Ak  , 


1 1  VAt  hk  |  1 2 

I  I  VAt  VA/  1 1 2 


Now,  using  the  standard  assumptions,  since 


VAt  hk  1 1 2  > 


II  A*  ||s 


1 1  (VAtTVAt)_1VAtr  1 1 2  ’ 


we  can  write 


II  At  III  -  1 1  At  +  VAtr sk  1 1|  > 


I  I  At  |  | 2  r  .  I  I  At  I  I  2 

II  (VA/VAtrWA/lh  mm  *  ’  1 1  (VA/VA^VA/  Mail  VAt  VA*r  1 1 
Now  from  the  standard  assumptions  there  exist  constants  Ax  ,  and  b2 

bi=  sup  ||  (Vh(x)TVh(x))~1'Vh(x)T  ||2 

x  t  ci 


-]  (4.2.4) 

2 

where 
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and 

b2=  sup  [  ||  (VhixfVhix^Vhix)1,  ||2  ||  Vh(x)Vh(x)T  ||2] 

if() 

The  rest  of  the  proof  follows  immediately  by  substituting  b1  and  b2  into  (4.2.4). 


Corollary  (4.2) 

Let  k  be  the  index  of  any  iteration,  then  the  predicted  decrease  in  the  model  by 
the  trial  step  satisfies 

P  ,  ^  rk  IU*  ||2  .  II  a*  1 1 2  , 

2  b1  b2 

where  and  b2  are  as  in  Lemma  (4.1). 


Proof 


From  the  way  of  updating  the  penalty  parameter  rk  in  step  3  of  the  algorithm, 
we  have 


Predk  >  ^[  |  \hk  Ilf-  ||  hk+VhkTsk  |||]  . 
The  rest  of  the  proof  follows  immediately  from  the  last  lemma.  ■ 


Lemma  (4.1)  shows  that  the  way  of  choosing  9k  in  the  CDT  subproblem 
implies  that  we  always  get  a  fraction  of  Cauchy  decrease  in  the  constraints. 

Corollary  (4.2)  shows  that  the  way  we  update  the  penalty  parameter  insures 
that  the  predicted  reduction  at  each  iteration  will  be  at  least  as  much  as  a  frac¬ 
tion  of  Cauchy  decrease. 
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Lemma  (4.3) 

If  s  is  the  solution  to  the  following  problem 


minimize  gT s  +  —  sT B  s 
2 

subject  to  I  I  s  |  |2  <  A 


for  any  g  e  Rn  and  any  nXn  symmetric  matrix  B  ,  then 

II  <7  ||s 


9Ts  <  -  j  1 1  9  I  |2  min  [  A  ,  g  |  g  | 


Proof 


(4.2.5) 


The  proof  follows  directly  from  Lemma  (3.2)  of  Powell  and  Yuan  (1986).  How¬ 
ever,  for  the  sake  of  completeness  we  present  a  proof  for  the  lemma. 

If  ||  g  1 12  = 0  »  then  (4.2.5)  is  trivial.  So,  let  us  consider  the  case  when 

1 1  9  1 12  >  0  • 

If  the  trust  region  is  not  active  then  the  step  is  computed  from  B  s  =  -  g  . 
Hence  we  can  write 

s  =  B+  g  +  W  , 

where  B+  is  the  generalized  inverse  of  B  and  I  is  a  vector  in  the  null  space  of 
B  . 

Since  g  is  in  the  range  space  of  B  ,  it  follows  that 

9T  s  =  -  gT  B+  g 


<  - 


B 


9  II2 


<  - 


4  1 1  B  \\, 


\\g\\l 


(4.2.6) 
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If  the  trust  region  is  active  then  from  Ivuhn-Tucker  theory  there  exists  a  multi¬ 
plier  n  >0  such  that 


=0 


(4.2.7) 


Using  the  same  argument  as  above,  we  can  write 


gT  s  <  - 


B  +  fi  I  1 12 


II  5  III 


But  from  (4.2.7),  we  have 


/*IMIS=  II  B  a  +g  ||2  <  ||  B  ||2  II  *  ll2+  II  9  ll2 


or 


<  IU  II2  + 


9  II2 


Thus,  we  have 


B  +»I  ||2  <  I  \B  ||2  +  P 


<5  2  |  |  B  1 12  -f 


9  ll2 


2  II  B  1 12  A  +  |  |  g 


So, 


gT  s  < 


g  III 


B  +  n  I  < 


\\g  Ml  A 

-  —  2  |U||2a+  IUII2 

From  the  last  inequality  and  (4.2.6),  we  can  write 


g  |  |2  min  [  A 
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Hence  we  get  the  desired  result.  ■ 


Corollary  (4.4) 


For  any  step  sk  generated  by  the  algorithm,  let  sg  =  Pk  sk  and  sg  =  Qk  $k 
where  Pk  =  I  —  Vhk{\7h[Vhk)~1Vhk'  and  Qk  =  /  —  Pk  .  Then,  sg  solves  the  fol¬ 
lowing  problem: 


minimize  [Pk  ( V/*  +  Bk  sg  )}T s  +  ~  sT PkBkPk  s 


!?)ir*  +  y»r 

subject  to  II  *  ll2  <  Ak 

where  Ak  =  \/  Ak  —  \  \  sg  1 1 2  .  Furthermore,  sg  satisfies 


pkm+Bk  n2 


(  V/*  +Bk  sgfsg  <  ~j  \\Pkm+Bk  sg)  1 12  min  [A,  ,  2  \\  Bk  \  \2 


Proof 


The  proof  follows  directly  from  Powell  and  Yuan  (1986).  However,  for  the  sake  of 
completeness  we  present  a  proof  for  this  lemma. 

Since  sk  =  sj!  +  sg  ,  sg  solves  the  following  problem: 

minimize  Vlf(s  +  sg  )  +  (s  +  sg  )T Bk  (s  +  sg  ) 

subject  to  VA/s  =  0 

1 1  s  +  1 12  <  Aj  . 

The  last  problem  is  equivalent  to 

minimize  (V/fc  -f  Bk  sg  )T  s  +  —  sT  Bk  s 
subject  to  VA/s  =  0 

II  s  1 12  <  At  • 

Since,  sg  lies  in  the  null  space  of  Vhk  ,  then  Vhksg  is  always  zero.  Hence  sg 
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solves  the  last  problem  even  if  the  constraint  V/j/s  =  0  is  deleted.  That  is,  s% 
solves  the  following  problem. 

minimize  \Pk{Vlk  +  Bk  si  )]rs  +  ~  sT PkBkPk  s 
subject  to  I  1  *  I  1 2  5:  At  • 

Now  using  (4.2.5)  and  1 1  PkBkPk  |  |2  <  1 1  Bk  1 12  ,  we  get 


(  V4  +Bk  snTH  <  ~  II  Pkm+Bk  ||2min  [A* 


||  Pk(Vlk+B  si)  ||2 
2||B,||2 


Hence  we  get  the  desired  result.  ■ 


Lemma  (4.5) 

There  exists  a  constant  cx  such  that  1 1  Vlk  I  I2  5:  C1  • 

Proof 

Since  \k  =  \k-h+i  =  -  (  V/i/L^  Vhk_ti  )~lVhjf_tt  (  +  Bk_h  sk_h  ),  where 

sk-tt  is  the  last  acceptable  step,  we  have 

1 1  X  J  |2  <  ||  (VhkT_h  II2  [  II  1 12  +  ||  Bk_lt  1 12  1 1  sk_h  1 12 

The  boundedness  of  II  X*  ||2  follows  immediately  from  the  standard  assump¬ 
tions. 

Now,  because  II  V/t  ||2  <  ||  V/fc  ||2  +  ||  Yhk  ||2  ||  X*  ||2,  we  can  see  that 

the  proof  of  the  lemma  follows  from  the  boundedness  of  1 1  X*  1 12  and  the  stan¬ 
dard  assumptions.  ■ 


Lemma  (4.6) 
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For  any  xk  ,  xk  +sk  e  fl ,  we  have: 

|  Aredk  -  Predk  |  <  ak  1 1  sk  1 ||  +  rk  [  a2  |  |  sk  1 ||  +  a3  |  |  hk  |  |2  1 1  sk  |  ||  ]  , 
where  a1  ,  a2  ,  a3  are  constants  independent  of  k  . 

Proof 

From  (3.1.1)  and  (3.1.2)  we  can  write: 

Aredk  -  Predk  =  [  l{xkt\k)  +  Vxl(xk,\k)T sk  +  jSkBksk  -  l{xk+sk  ,  \k)  } 

+  ~  h{xk+sk)  } 

+  rk  [  1 1  hk  +  Vhk  sk  I  II  -  II  h(xk+sk)  I  |22  ]  • 

So, 

|  Aredk  -  Predk  \  <  \  l{xk)\k)  +  V  x  l  (xk  ,\k)T  sk  +  j  s?Bksk  -  l{xk+sk  ,  \k)  \ 

+  I  &Xk  [  h  +  ^hk  h  ~  hixk+h)  ]  I 

+  rk  I  1 1  hk  4-  Vhk  sk  |  ||  —  II  h(xk+sk)  |  ||  |  . 

Hence, 

|  Aredk  -  Predk  \  <  ^\sk  {  Bk  -  V|/(xjfc+^1sfc,Xt)  ]  sk  | 

+  j  \  *k  {  V2h{xk+Z2sk)  A\k  }  sk  | 

+  rk  I  Ski  Vhk  VhkT  -  Vh  ( xk  +£3sk)Vh  T  ( xk  +£3s* )  ]  sk  | 

+  rk  I  sk  V2h{xk+^zsk)  h(xk+£3sk)  sk  \  , 

for  some  Ci  ,  62  ,  £3  c  (0,1)-  So, 

|  Aredk  -  Predk  |  <  j  (  1 1  V^l(xk+^sk,Xk)  1 12+  1 1  Bk  \  |2  )  1 1  sk  \  || 

+  J  I  I  ^h{xk fc+^/fe)  I  I2  I  I  h  1 1| 
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+  rk  ||  Vhk  Vh?  -  Vh{xk+izsk)VhT (xk+£3sk)  |  |2  1 1  sk  |  |22 
+  rk  ||  V2h(xk+£3sk)h {xk+Z3sk)  ||2  1 1  sk  1 122 

Now  by  using  the  standard  assumptions,  we  get 

|  Aredk  -  Predk  \  <  ak  \  \  sk  |  ||  +  a2  rk  \  \  sk  \  |f  +  a3  rk  \  |  sk  1 122  1 1  hk  \  |2 
Hence  we  get  the  desired  result.  ■ 

The  result  we  obtained  in  the  last  lemma  does  not  depend  on  any  property  of 
the  matrices  {  Bk  }  except  that  they  are  bounded,  and  does  not  depend  on  any 
property  of  the  step. 

Corollary  (4.7) 

Under  the  assumption  of  Lemma  (4.6),  we  have 

|  Aredk  -  Predk  \  <  a0  rk  \  \  sk  \  |22 
where  a0  is  a  constant  independent  of  k  . 

Proof 

The  proof  follows  immediately  from  the  last  lemma,  the  fact  that  rk>  1  ,  and 
the  standard  assumptions.  ■ 

Corollary  (4.7)  shows  that  our  definition  of  predicted  reduction  of  the  merit 
function  gives  an  approximation  to  the  merit  function  that  is  accurate  to  within 
the  square  of  the  steplength. 
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Lemma  (4.8) 

If  sk  and  si  are  as  in  Corollary  (4.4),  then 

(  V4  4 -  Bk  sk)T  si  <  0  . 

Proof 

If  sk  is  the  step  generated  from  the  CDT  subproblem,  then  from  (2.2.12)  sk 
satisfies 

[  B/c  +  P-I  +  a^hkVh,k  ]  sk  =  —  V4  —  oV/i*  hk  . 

Equivalently, 

—  (V4  +  Bk  sk)  =  n  sk  +  atVhk  (  hk  +  Vhksk  )  . 

Now 

—Pk  (V4  +  Bk  sk )  =  nPksk+aPk[  Vhk  (hk+Vhksk)  }  . 

Since  PkVhk  =  0  ,  we  get 

~P k  (V4  +  Bksk)  =  n  sk  . 

So,  since  Pk  —  Pk  and  Pk  §1  =  §1  , 

- (V4  =  H  II  ^  III. 

which  implies  that 

(  V4  +  Bk  sk  )r  SI  <  0  . 

Now,  assume  that  the  step  is  generated  from  the  TRQP  subproblem.  Then  sk 
must  satisfy 

(  Bk  +  n  I  )T  sk  =  —  (V4  +  Vhk  AX*)  . 
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Notice  that  ji  —  0  if  the  step  is  generated  from  the  QP  subproblem,  i.e.  if  the 
trust  region  constraint  is  not  binding.  The  last  equation  can  be  written  as 

V4  +  Bk  4  =  —  Vhk  S\k  —  fi  sk  . 

Now,  by  multiplying  by  Pk  ,  we  obtain 

^4(v4  +  pk  h)  =  —pk^hk  AX*  —  n  Pk  sk  . 

Again  since  PkVhk  =  0  we  have 

pk  ( ^4  +  Bk  sk )  =  —  n  H 

which  implies  that 

(  Vlk+Bksk)TsZ  =  -  fi  II  in  II  <  0. 

Finally,  assume  that  the  step  sk  =  skp  —  —  akVhk  hk  ,  where  ak  is  defined  by 
(4.2.1),  then 

H  =  pkh  =  -<*k  pk  hk  =  0  . 


So, 


(  V4  +  Bk  sk  )Ts[  <  0. 


This  implies  that  in  all  cases  the  lemma  is  true.  ■ 


Lemma  (4.9) 

Let  s%  be  as  in  Corollary  (4.3),  then  there  exists  a  constant  b3  such  that: 

I  I  **  I  1 2  <  &3  I  I  hk  |  |2 

Proof 


Since 
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1 1  H  1 12  —  1 1  Qk  II2 

=  ||  Vhk(VhkTVhk)-'Vh?  sk\U 

Equivalently,  we  can  write 

1 1  il  1 12  =  II  Vhk(VhkTVhk)-'  (  hk  +  VhkTsk  -hk)  1 12 

Hence, 

I  \H  1 12  <  II  Vhk(VhkTVhk)-1  1 12  [  1 1  hk+  Vh?sk  -hk  1 12  ] 

Now,  from  the  definition  of  sk  ,  we  can  write 

I I  H 1 12  <  2  1 1  Vhk{Vh?Vhk)-'  ||,  1 1  hk  1 12  (4.2.8) 

Set 

63  -  2  sup  1 1  Vh(x)  (  Vh(x)T  Vh(x)  T1  1 12 

it  Q 

The  result  now  follows  if  we  substitute  b3  in  (4.2.8).  ■ 


Lemma  (4.10) 

Let  sk  be  the  step  generated  by  the  algorithm.  Let 
Corollary  (4.3)  and  hk  =  V/i^V/i/VA*)-1^  ,  then 

frerf,  >  -j-  1 1  P,(Vi,  +  Btil  )  1 1 2  min  (  - 

-Ml«*  It  II**  I  Is  -  Kv/j+b^p 
+  rk  [  I  I  hk  |  ||  —  I  I  hk  +  Vhksk  III], 

where  60  and  64  are  constants  independent  of  k  . 


Pk  ,  A*  ,  Sjf  ,  SI  be  as  in 


II  n(V4  +  Bksk  )  1 1 2 


(4.2.9) 


Proof 
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Since 

Predf.  =  —  V/*r  sk  —  —  sk  Bk  sk  —  AXk  (  hk  +  Vhksk  ) 

+  r*  [  ||  hk  ||2-  |  \hk+VhkTsk  |||], 

we  can  write 

Predk  =  —  (V4  +  Bksk)T  sk  +  ~  sk  Bk  sk 

+  (^4  +  Bksk)TVhk  (V/j/V/i*)-1(  hk  +  VhkT sk  ) 

+  r*  [  ||  hk  Ml-  ||  hk+Vhfa  |||]  . 

Now,  since  Vhk  (  Vhk  Vhk  )-1  Vhk  sk  =  sjj ,  we  can  write: 

Predk  =  -  (  V/*  +  Bksk  )Tsk  +  L  $  ^  ^ 

+  (  ^4  +  Bk  h  ) T  [  lik  +  4?  ] 

+  rk  [  1 1  hk  1 1|  —  1 1  hk  H-  Vhksk  III]. 

Since  —  s]{  =  s%  ,  we  get 

=  -  (V/,  +  Bksk)Ts£  +  1  skTBksk  +  (V4  +  £^)r/T* 

+  '*  [  II  A*  III  -  II  h  +VhkTsk  HI]. 

But  by  using  Lemma  (4.8),  we  can  write 

-  (V4  +  Bksk)T s%  >  -  (V4  +  Bksk)T s%  . 

Now 

FreJ,  >  -  i  (VI,  +  B,s,)Til  +  i  +  (V/t  + 

+  rk  [  1 1  h  I II  —  1 1  hk  +  Vhksk  I II  ] 

>  -  i  (V4  +  B,H  fsl  -  i  +  i 


(4.2.10) 
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+  (V/*  +  Bksk)T hk  +  rk  [  1 1  hk  |  ||  -  1 1  hk  +  V/j/ sk  1 1|  ]  ; 
which  can  be  written  as 

Prei 4  >  —  (  v4  +  5/  )T  H  +  |  +  1 

+  (V4  +  Bksk)Thk  +  rt  [  1 1  hk  1 122  -  II  hk  +  Vh£sk  j  ||  ]  . 


By  using  Corollary  (4.4),  we  get 


Predk  >  j  ||  Pk(Vlk  +  Bksl )  1 12  min  [  A* 


||  Pk(Vlk  +JhH)  ll2 

2||b*M2 


+  J  ( H)TBtsl  +  |  stTB„3l  +  (V/t  +  BtSt)TKt 
+  r*  [  I  I  1 122  ~  I  I  hk  +  Vhk$k  1 1|  ]. 


But  by  lemma  (4.8),  1 1  1 12  <  63  |  \  hk  1 12  ,  ||tf||2<  IUII2,  and 

from  the  standard  assumptions  there  exists  a  constant  60  such  that 
1 1  Bk  1 1 2  <  60  .  So,  we  can  write 

>  j  1 1  B,(V 4  +  Btll )  1 12  min  [  A,  ,  ] 

2o0 

—  (^0^3  1 1  %  1 12  I  I  hk  |  |2)  —  |  (V/t  +  Bksk)T  hk  | 

+  B  I  lUt  III  -  II  A*  +VA/4  |||]. 


If  we  set  b4  =  b0  b3  ,  we  will  get  the  result.  ■ 


The  first  term  and  the  fourth  term  in  (4.2.9)  are  positive,  and  the  second  and 
the  third  are  negative.  In  order  to  prove  that  we  will  get  a  positive  predicted 
reduction  each  iteration,  we  have  to  prove  that  the  positive  quantities  are  greater 
than  or  equal  to  the  negative  quantities  otherwise  we  have  to  increase  the  penalty 
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parameter  to  insure  that.  First  we  need  to  get  an  upper  bound  on  the  third  quan¬ 
tity:  Corollary  (4.12)  will  give  us  that  bound.  But  first  we  need  the  following 
lemma 

Lemma  (4.11) 

Let  Qk  be  as  in  Corollary  (4.3),  then  there  exist  constants  &5  and  b6  such  that 
I  I  Qki^ 4  +  Bksk)  1 12  <  65  1 1  sk  1 12  +  66  ||  sk_h  1 1 
where  sk_tk  is  the  last  acceptable  step  and  k— tk  >  0  . 

Proof 

We  have 

Qki^h  +  sk )  =  Qk  V/fc  +  Qk  Vhk  \k  +  Qk  Bk  sk 

Now,  since 

Qk^fk  =  Vhk  (  VhkT  Vhk  r1  VhkTVfk  =  -  Vhk 
where  =  —  (  Vhk  Vhk  )  Vhk  Vfk  ,  and  since  , 

Qk  Vhk  =  Vhk  (  VhkT  Vhk  )~l  VhkT  Vhk  =  Vhk  , 

we  have 

Qk  ^hk  \k  =  Vhk  \k  =  Vhk  \k_ti+1 

=  [  (  ^hk-if.  ^h-tk  )  l^hk_tk  (  V fk-tk  +  Bk_tt  sk_tk  )  ] 

=  -  (  Vhk_h  Vhk_tt  )~lVhl r_tt  Bk_h  sk_h  ]  . 


This  implies  that 
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1 1  Q*(V/*  +  Bksk)  1 12  <  ||  Vhk  M  -  \[_tt)  1 12 

+  bi  1 1  ^ h  1 12  II  Bk_h  1 12  ||  sk_h  1 12 
+  II  Bk  ||2  ||  sk  ||2  (4.2.11) 

Now  by  using  the  standard  assumptions,  there  exists  a  constant  b7  ,  such  that 

II  VMV-XU)II2  <  llv^||2  ||x/-x/_j|2 

<  67  1 1  ar*  -  xk_h  1 12  , 

and  since  xk  —  xk_tk+1  ,  we  have 

1 1  Vhk  (X/  -  \Ltk)  1 1 2  <  b7  II  xk-tk+ 1  -  xk-tk  1 1 2 

<  h  II  *k-tk  II2  •  (4.2.12) 

Substitute  (4.2.12)  in  (4.2.11),  and  by  using  the  standard  assumption,  we  obtain 

1 1  <2*(V/*  +  Bk  sk)  1 12  <  65  1 1  sk  1 12  +  &6  ||  sk_h  1 12  . 

Hence  we  get  the  desired  result.  ■ 

Corollary  (4.12) 

Let  hk  be  as  in  Lemma  (4.10),  then  there  exist  constants  a4  and  a5  such  that 
I  (V4  +  Bksk)T  hk  |  <  [  a4  1 1  sk  1 12  +  a5  ||  sk_h  1 12  ]  1 1  hk  \  |2 
where  sk_tk  is  the  last  acceptable  step  and  k—  tk  >  0  . 

Proof 

Since  Qkhk  =  hk  ,  we  have 

I  (V4  +  Btstf  Kt  I  =  I  [  Qt  (V/t  +  Bti„  )]r  A,  I 
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<  II  Qk  (^4  +  Bksk)  I  |2  1 1  hk  I  |2 

Now,  by  using  Lemma  (4.11)  and  the  fact  that  1 1  hk  1 12  <  68  1 1  hk  1 12  ,  where 

68  =  sup  1 1  Vh{x){Vh(x)T Vh(x))~1  ||2  the  proof  follows  immediately.  ■ 

x  t  n 


The  following  lemma  proves  that  if  1 1  hk  1 12  is  small  enough,  then  we  do 
not  need  to  increase  the  penalty  parameter  in  step  (3)  of  the  algorithm. 

Lemma  (4.13) 

Let  k  indexed  an  iteration  at  which  the  algorithm  does  not  terminate,  if 
II  hk  1 12  <  c2  At  where  c2  is  a  small  constant  that  satisfies 

2  -  2  63  3  A*  3  64  A,  48  (a4  4-  64  +  a5)  A*  v  3  b0  A*  ’  J  v 

where  a4  and  a5  are  as  in  Lemma  (4.12),  b3  is  as  in  Lemma  (4.9),  b4  is  as  in 
Lemma  (4.10),  and  A*  is  an  upper  bound  on  the  trust  region  radius,  then 


Predk  >  -j-  [  1 1  hk  \  |22  -  1 1  hk  +  Vh?sk  1 122  ] 

+  ~  II  n(V4  +  &k  H)  1 1 2  min  [ 


Pk{^lk  +  Bk  sg  )  1 12 
260 


Proof 


If  k  is  the  index  of  an  iteration  at  which  the  algorithm  does  not  terminate,  then 


I  I  Pk  ^4  I  I2  +  I  I  hk  I  |2  >  €  . 


But,  since  1 1  hk  \  |2  <  —  e  ,  it  follows  that 

3 


n  v/,  1 12  >  -i-  f 
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\\  Pk(Vlk  +  Bk  sn  ||2  >  Ilnv/Jl,-  \\PkBkii\U 


>  IIP*  V/*  ||2  -  ||  5,  ||2  IU/II2 


^  1 1  P*  v4  1 1 2  —  ^0  ^3  1 1  a* 


—  I  I  Pk  ^4  I  1 2  —  ^4  I  I  hk 


(4.2.14) 


Hence, 


^  2  ,  1 

>  —  c - e 

~  3  3 


P*  (V4  +  Bk  s% )  ||2  >  |e. 


(4.2.15) 


Now,  from  Lemma  (4.10),  Corollary  (4.12)  and  1 1  hk  |  |2  <  c2  At  ,  we  get 

P'‘d„  >  1  1 1  P„  (V/t  +  Bt i{ )  1 1,  min  |  St  ,  ■  11  Pt(VI‘  ±  M 1  lg 
*  2  o0 

—  c2  [  ^4  II  I  1 2  (  a 4  I  I  h  I  |2  +  a5  ||  I  |2  )  ]  Ak 


+  rk  [  II  hk  1 1|  —  |  |  hk  +  Vhfsf.  III]. 


(4.2.16) 


So,  by  using  (4.2.15),  we  can  write 


>  }  1 1  n(V<,  +  Bk  SI)  1 12  min  (  A,  ,  -!LlgL±£*_fL  1 

8  2o0 


1  ,  1 


+  »  (  r>  €  )  m‘n  [  A*  >  o  7  ]  ~  C2  [  {a4  +  b4  +  a5)  A*  ]  A* 


8  v  3 


+  rk  [  ||  A*  Ml-  ||  hk+VhkTsk  HI 


(4.2.17) 


Now,  since 


A*  -  V  A|-  1 1  «,*  1 1!  , 
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V? 

by  using  Lemma  (4.9)  and  1 1  hk  |  |2  <  — —  A*  , 

2  bo 


and  we  obtain 


a*  >  v  a i  -  bi  1 1  ~t  i  ii , 


At  >  V  A,2  -  (3/4)  A } 


=  —Ak  . 
2  * 


By  substituting  the  last  inequality  in  (4.2.17),  we  get 


Pr*d„  >  |  ||  P,(V4  +Bt  SI)  || 


2  min  [  —  At 


Pk{Vlk+Bk  H)  ||2 
26n 


+  4"  (  ~e  )  min  [  —  ,  — - — 

8  v  3  ’  1  2  k  ’  6  60 


c2  [  (a 4  +  64  4-  a6)  A*  ]  Ak 


+  r*  [  II  hk  HI-  ||  hk+Vhk%  |||]  . 


Since  c2  satisfies  inequality  (4.2.13),  we  have 

>  1  1 1  Pt(V/t  +  f>t  SI )  1 1,  min  I  1a  ,  - 1 1 1  P>(V\t  B>  il  ]  1 12  I 

(4.2.18) 


26r 


rk 


+  -£-[  l|A*  III-  II hk+VhkTsk  HI 


Hence  we  get  the  desired  result. 


^  1 1  hk  1 12  <  c2  Ak,  then  half  of  the  first  term  in  (4.2.16)  would  cancel  the 
second  and  the  third  terms,  and  the  fourth  term  need  never  enter  the  calculation. 
This  implies  that  if  we  set  rk  =  r k_k  ,  inequality  (4.2.18)  remains  correct.  So,  in 
this  case,  we  do  not  need  to  increase  the  penalty  parameter. 
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Lemma  (4.14) 


Let  k  be  the  index  of  an  iteration  at  which  the  algorithm  does  not  terminate.  If 
II  hk  1 1 2  <  c2  Ak  ,  where  c2  is  as  in  Lemma  (4.13),  then  there  exists  a  constant 
c3  such  that 

Predk  >  c3  Ak 


Proof 


From  (4.2.15)  and  (4.2.18),  we  have 


Predk  >  —  (  —  e  )  min  f  —At  — — 
8V3  '  1  2  *  6  br 


>  ~  e  min  [  1  , - - - 1  At  . 

—  4§  1  ’  o  k  a  J  * 


3  6q  A* 


The  result  now  follows  if  we  set  c,  =  —  e  min  [  1  _ - _ 

48  1  ’  3  b0  A* 


4.3  THE  BEHAVIOR  OF  THE  PENALTY  PARAMETER 

This  section  is  devoted  to  the  study  of  the  behavior  of  the  penalty  parameter. 
The  following  three  lemmas  are  needed  to  prove  that  the  penalty  parameter  is 
bounded.  In  Lemma  (4.18)  we  prove  that  the  penalty  parameter  will  remain 
bounded  as  long  as  the  algorithm  does  not  terminate. 

Lemma  (4.15) 

If  k  is  the  index  of  an  iteration  at  which  the  penalty  parameter  rk  increases,  we 


have 
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rk  min  [  Ak  ,  U  ~  1  ,2  ]  <  a6  1 1  sk  |  |2  +  a7  1 1  sk_h  |  |2 

where  a6  and  a7  are  constants  independent  of  k  ,  sk_h  is  the  last  acceptable 
step  and  k—tk  >  0  . 

Proof 


Let  k  be  the  index  of  an  iteration  at  which  the  penalty  parameter  increases,  then 
by  step  3  of  the  algorithm  rk  is  updated  by  the  following  rule: 

r  _  2  ^xlkh  +  %  skBksk  +  A \k(hk  +  Vh'kr sk ) 

‘  II  **111- ll*.+VAft  |||  +" 

This  can  be  written  as 


rk 


4-[  11**111-  II*. +VA/»i||f]  =  i„ 


—  (  +  Bk  sk  )T  hk 

+  J  I  II*.  Ill-  11*.+  VA/i,  1 1| 

Using  Lemma  (4.1),  inequality  (4.2.10)  and  sk  =  s£  +  1/  ,  we  get 


2 


II  II: 


min  [  Ak  , 


II  hk  ||s 


<  \{^h+BkH)T  H  +  \sf  Bh$l 
~  V  *k  Bk  sk  —  (  V/t  +  s*  )r  hk 

+  7T  I  1 1  hk  III  -  1 1  /**  +  VA/  4  1 1|  ] 


By  using  Corollary  (4.4),  we  can  write 

rk  1  I  I  1 2  .  r  .  I  I  I  1 2  1 

T— *—  mln|A‘>  — *—  I 


<  -  y  II  n(V/.  +Bt  si)  ||2  min  [A,  ,  11  P*(V'*  + ~  % )  It 

^  2  On 
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+  0  HT  Bk  si  —  sk  Bk  sk  —  (  V/A  +  Bk  sk  )T  hk 
-  p  {  hk  VhkT  4+1  skT  Vhk  Vh?  sk  ]  . 


Thus, 


r± 

2 


I  h/c  I  1 2 

~1 


min  [  Ak 


II  hk  1  1 2 

&2 


<  -1  (H)TBk  si  -  1  skTBk  si 


-  (  V/t  +  Bk  sk  )T  hk 


-  p  hk  VhkT  sk  , 


and  we  can  write 


2 


1 1  K  I  Is 


min  [  At  , 


h  2 
^2 


<  II  Bk  ||2  114  ||2  ||  4?  ||2 
+  p  II  VA*  ||2  II  4  II:  II  hk  ||2 

+  I  (  +  Bk  4  )T  hk  |  .  (4.3.1) 


Now  by  using  Corollary  (4.12), 


rk  1 1  hk  |  \, 


min  f  Al 


I  1 2 


I  <  1 1  Bk  ||2  ||4  ll2  IU?II2 

+  (a4  1 1  h  1 1 2  +  «5  II  sk-tk  1 12)  1 1  hk  1 1: 

+  P  II  VA*  ||2  ||4  ll2  II  hk  ||2. 


But,  by  Lemma  (4.8)  1 1  4?  1 1  <  h  1 1  hk  \  |  and  from  the  standard  assumptions 
1 1  VA*  1 12  <  69  where  69  =  sup  1 1  VA(ar)  1 1  , 


£  Q 


rk  I  I  h 


k  II 2 


2  61 


min  f  A, 


II hk  ||; 


"k  1 


<  (  b0  b3  +  a4  +  p  bg  )  I  I  4  I  |2  I  I  hk  |  |; 


+  a5  I  I  sk-tk  I  l2  I  I  h  I  |2  . 
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The  result  follows  immediately  if  we  divide  by 


hk  2 


Corollary  (4.16) 

If  k  is  the  index  of  an  iteration  at  which  the  algorithm  does  not  terminate  and 
the  penalty  parameter  rk  increases,  we  have 

rk  Afe  <  aS  I  I  S/e  I  I2  +  a9  II  Sk-tk  I  I2 

where  a8  and  a9  are  constants  independent  of  k  ,  sk_h  is  the  last  acceptable 
step  and  k—tk  >  0  . 

Proof 


From  Lemma  (4.15),  if  k  is  index  of  an  iteration  at  which  the  penalty  parameter 
rk  increases,  then  rk  must  satisfy  the  following  inequality: 


rk  min  [  Ak  , 


h  ||. 


^  a6  I  I  h  I  1 2  +  a7  II  sk-tk  I  1 2 


From  Lemma  (4.13)  if  1 1  hk  \  |2  <  c2  Ak  ,  then  we  do  not  increase  rk  .  So,  for 
any  iteration  at  which  the  penalty  parameter  increases,  we  must  have 

1 1  hk  1 12  >  c2  Ak  , 


and  we  get 


rk  min  [  Ak 


^  a6  I  I  h  I  1 2  +  a7 


sk-tk  I  1 2  - 


This  can  be  written  as 


r*  Ak  min  [  1  ,  ]  <  a6  1 1  S*  1 12  +  a7  1 1  sk_h  \  |2  . 
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Hence, 

rk  ^k  ^  a8  I  I  I  1 2  H-  a g  ||  Sje_tt  I  1 2  , 

and  we  get  the  desired  result.  ■ 


By  the  standard  assumptions,  at  each  iteration  at  which  the  penalty  parame¬ 
ter  increases,  rk  Ak  is  bounded.  However,  if  we  can  bound  —  - 1  ^  by  a  con¬ 


stant  independent  of  k  ,  we  can  get  an  upper  bound  on  rk  itself.  In  the  follow¬ 
ing  lemma  we  get  a  relation  between  1 1  sk_h  |  |2  and  Ak  .  In  Lemma  (4.18)  we 
prove  that  the  penalty  parameter  is  bounded. 


Lemma  (4.17) 

Let  k  be  the  index  of  any  iteration  at  which  the  algorithm  does  not  terminate 
and  the  penalty  parameter  rk  increases,  then 

^k  —  c4  II  sk-tt  I  1 2 

where  sk_h  is  the  last  acceptable  step,  c4  is  a  constant  independent  of  k  and 
tk  ,  and  k—tk  >  0  . 

Proof 

Let  us  consider  three  cases: 

First,  if  tk  =1  ,  i.e. ,  sk_ k  is  the  last  acceptable  step,  then  from  (3.1.3),  we  have 

^ k  ^  al  II  sk-l  I  1 2 

The  result  in  this  case  follows  if  we  set  c4  =  a1 
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Second,  if  s t_1  is  not  the  last  acceptable  step,  but  1 1  hk_i  |  |2  >  c2  Ak 
i  €  [  1  i  h— 1  ]  •  In  this  case,  from  Corollary  (4.7),  we  have 

I  Aredk_i  -  Predk_i  \  <  a0  rk_{  \  \  sk_{  |  || 

Now,  from  Corollary  (4.2),  we  have 

^  rk-i  II  hk-i  |  1 2  f  A  II  h-i  ||2 

Predk-i  >  ■ - — - min  [  Ak_{  ,  - — - ]  , 

But  since  all  k-i  ,  i=l,...,tk-l  satisfy  ||  hk_{  1 12  >  c2  Ak_{  >c2  || 
we  have 


Predk_i  > 


rk-i 


I  I  ^k-i  I  1 2 
b1  62 


I  I  h-i  1 12  min  [  b2  ,  c2 


Hence, 


Aredk_i  Predk_j_  |  2  a0  b1  b2  1 1  sk-i  ||2 


Predk_i  min  \b2,  c2]  ||  hk_{  1 12  ' 

But  since  all  k-i  ,  i=l,...,tk-l  index  unacceptable  steps,  we  have 


,  „  .  ,  Aredk_i 

So,  for  all  i  e  [  1  ,  tk  — 1  ],  we  have 

1 1  h-i  1 12  >  min  [  b2  ,  c2  ]  1 1  hk_{  \  |2  . 

Now,  since  xk_k  =  xk_^k_^  ,  =  hk_^t_^  ,  we  have 

^k  ^  *^1  I  I  $k— 1  I  1 2 


1  <  i  <  tk- 1 


^  «i  (  1  -  Vi  )  .  r  ,  ,  ,  ,  , 

-  i^TmT  1  2 ' 02 1  lu‘-ll! 


ai  (  1  -  Vi  )  .  r  , 

— - — - —  mm  62  ,  c2 

2  a0  b1  b2 


I  I  ^k-(tt- 1)  I  1 2 


for  all 


h-i  1 1 2  , 
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> 


a\  c2  ( i  -  m ) 

2  a0  b1  b2 


min  [  62  ,  c2  ]  At_(ti_1} 


^  «i  c2  (  1  -  Vi  )  .  f  ,  ,  ,  ,  ,  , 

>  - - - min  [  b2  ,  c2  ]  ||  sk_t  1 12 


2  a0  bk  b2 

The  result  in  this  case  follows  by  setting 

=  Ql2  C2  (  1  ~  *ll  ) 
2  a0  b2 


min  [  b2  ,  c2 


Finally,  if  the  step  indexed  by  k—  1  is  not  the  last  acceptable  step  and  not  all 
*  e  [  1  >  h~ 1  ]  satisfy  ||  hk_i  \  \2  >  c2  Ak_{  ,  then  there  exists  at  least  one 
3  €  i  1  >  h~ 1  ]  such  that  1 1  hk_j  \  \  2  <  c2  Ak_j  .  Let  l  be  the  smallest  integer 
6  [  1  >  ^—1  ]  such  that  1 1  hk_t  \  |2  <  c2  .  For  all  i  e  [  1  ,  /—I  ]  ,  we  have 

1 1  hk_i  1 12  >  c2  Ak_i 

As  in  the  first  two  parts,  if  we  set 


•  r  ai  c2  (  1  —  rh  ) 

c5  =  min  [  <*!  ,  — 2  a ^  b i  b ^ - min  (  b2  ,  c2  )  ]  ,  (4.3.2) 

we  get 

>  c5  ||  sk_,  \\2  (4.3.3) 

where  c5  is  given  by  (4.3.2).  Now,  for  k-l  we  have 

1 1  hk_i  1 12  <  c2  A k_i  .  (4.3.4) 

From  Lemma  (4.6)  we  have 

\Aredk_l-Predk_,  |  <  ax  1 1  sk_,  \  |f  +  r*_f  [  a2  |  |  sk_,  1 1|  +  a3  |  |  hk_t  |  |2  1 1  sk_,  1 122  ]  . 


By  using  inequality  (4.3.4),  we  have 

|Are<4_,  -  Prtdk_{  |  <  ax  \  \  sk_,  \  ||  +  rk_,  (  a2  +  a3  c2  )  1 1  sk_,  |  |f  Ak_t  .  (4.3.5) 
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If  k  indexes  an  iteration  at  which  rk  increases,  then  from  Lemma  (4.16)  and  the 
standard  assumptions  we  know  that  rk  Ak  is  bounded.  By  using  inequality 
(4.3.3),  we  get 

rk-t  II  Sk-i  ll2  <  —  ot-/  At 

c5 

<  —  rk  A*  <  to0  , 

c5 

where  m0  is  a  uniform  bound. 

Hence  inequality  (4.3.5)  can  be  written  as 

I  Aredk_t  —  Predk_l  |  <  at  \  \  sk_[  |  ||  +  (  o2  +  c2  a3  )  m0  \  \  sk_i  \  |2  Ak_l 


—  [  ai  +  (  a2  +  c2  a3  )  m0  }  |  |  sk_i  1 12  Ak_!  . 

By  using  Lemma  (4.14),  we  get 

|  Aredk-i  ~  P^dk-i  *  K  +  (a2  +  c2a3)m0]  ||  8k_,  1 1 2  Ak_[ 
Predk_t  ~  c3  Ak_, 


_  al  +  (  a2  +  c2  a3  )  m0 
c3 

But  since  the  k-l  ^  is  not  an  acceptable  step,  then 

.  .  .  Aredk_, 

(  1  -  Vi  )  <  I  -a  -i 
Predk_t 


sk-l  I  1 2 


< 


ai  +  (  a2  +  c2  a3  )  m0 

c3 


sk-l  I  1 2 


Hence,  by  using  inequality  (4.3.3),  we  obtain 

A:  >  c5  1 1  sk_t  1 1 2 


> 


_ c3  c5 _ 

[  ai  +  (  «2  +  c2  q3  )  ?n0  ] 


(  1  -  m )  • 
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> 


c3  c5  (  1  -  Vi  ) 


al  +  (  «2  +  c2  a3  )  m0  ] 


I  I  sk-tt  I  1 2 


The  result  then  follows  if  we  set 


c4  =  min  l  cs  , 


c3  c5  (  1  -  Vi  ) 


al  +  {  «2  +  c2  «3  )  m0  ]  &■* 


This  completes  the  proof. 


Lemma  (4.18) 

Under  the  standard  assumptions,  if  the  algorithm  does  not  terminate,  the  penalty 
parameter  rk  is  bounded. 

Proof 

The  proof  is  by  contradiction.  Suppose  that  rk  is  not  bounded.  This  implies 
that  there  exists  an  infinite  subsequence  of  indices  {kj}  at  which  {rk }  is 

unbounded.  Now,  from  Lemma  (4.13),  we  never  increase  the  penalty  parameter  if 
1 1  h  1 12  <  c2  A*  .  So,  ||  hk]  1 12  >  c 2  Akj  . 

Let  m  be  any  integer  e  {  kj  },  then  from  Corollary  (4.16)  we  can  write 

rm  <  ag  |  |  sm  |  1 2  +  a9  ||  sm_fm  1 1 2  ,  (4.3.6) 

where  sm_tm  is  the  last  acceptable  step.  On  the  other  hand,  from  Lemma  (4.17) 
we  have 


Hence 


^  c4  I  I  sm-tm  I  I2  ■ 


°m-t„ 
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By  substituting  the  last  inequality  in  (4.3.6),  we  get 

rm  <  a8  H - • 

c4 

Set 

N  -  (ag  +  ii). 

^4 

Since  TV  is  independent  of  m  ,  it  is  an  upper  bound  of  the  sequence  {  rk  }  con¬ 
tradicting  the  assumption  that  the  sequence  {  rk  }  has  no  upper  bound.  This 
proves  the  theorem.  ■ 

From  the  last  lemma,  we  can  conclude  that  for  all  k  ,  1  <  rk  <  r*  where 
r*  is  a  constant  independent  of  k  . 

Since  if  rk  increases,  it  will  increase  by  a  quantity  >  p  ,  then  the  number 
of  iterations  at  which  the  penalty  parameter  increases  is  finite.  Hence,  there  exists 
a  constant  k  such  that  rk  =  rk  for  all  k  >k  . 


4.4  THE  GLOBAL  CONVERGENCE  THEORY 


In  this  part  we  present  our  global  convergence  theory.  We  start  by  proving 
that  the  algorithm  is  well  defined  in  the  sense  that  it  always  finds  an  acceptable 
step  from  any  point  that  does  not  satisfy  the  termination  criteria.  Then  we  prove 
that  the  algorithm  will  terminate  at  a  point  within  e  of  a  Kuhn-Tucker  point. 
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ful. 

We  denote  by  S(  kx  ,  k2  )  the  set  of  indices  of  successful  iterations  in  the  interval 
[  ^1  !  ^2  ]  • 


The  following  theorem  shows  that  the  algorithm  is  well  defined  in  the  sense 
that  at  any  iteration  either  the  point  ( xk  ,  \k)  is  within  e  of  a  Kuhn-Tucker 
point  and  the  termination  condition  of  the  algorithm  will  be  met  or  the  algorithm 
will  always  find  an  acceptable  step. 


Theorem  (4.19) 


Under  the  standard  assumptions,  either  the  point  (  xk  ,  \k  )  is  within  e  of  a 
Kuhn-Tucher  point  and  the  termination  condition  of  the  algorithm  will  be  met  or 

we  always  find  an  acceptable  step.  i.e.  the  condition  Aredk+i  >  „  be 

Pfedk+j  ~ 


satisfied  for  some  j  . 


Proof 


If  the  termination  condition  of  the  algorithm  is  satisfied,  then  there  is  nothing  to 
prove.  Assume  that  the  point  (  xk  ,  \k  )  does  not  satisfy  the  termination  condi¬ 
tion  in  step  1  of  the  algorithm. 

First,  we  assume  that  \  \  hk  ||2  >  c2  Ak  where  c2  is  as  in  Lemma  (4.13). 
Since,  from  Corollary  (4.2),  we  have 


Predk  >  — — 

k  _  2 


IK \u 


mm 


IK  Ih 


1L 

2 


I  hk  |  1 2  Afc 

&2 


min 


'2  > 


> 
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and  since  from  Corollary  (4.7), 

|  Aredk  -  Predk  \  <  a0  rk  A*2  ,  (4-4.1) 

then,  we  have 

|  Ared^ .  Predk  ^  2  o,q  bk  62  Ak 

Predk  ~  ||  hk  |  |2  min  [  b2  ,  c2  ] 

That  is, 

|  Aredk  _ i  I  <"  2  a0  b1  b2 

Predk  ~  ||  hk  |  |2  min  [b2,c2]  k  ' 

j\.  Tzdu 

Now,  as  At  gets  smaller,  the  quantity  |  — — ~  _  i  |  approaches  0  and  hence 

iredfc 

the  condition  ~prt^  —  be  met  after  a  finite  number  of  trials. 

Now,  assume  that  1 1  hk  1 12  <  c2  A*  ,  from  Lemma  (4.14)  we  have 

Predk  >  c3  Ak 

This  gives,  using  (4.4.1),  that 


Aredk  -  Predk  ,  ^  a0  r*  A 

1  -  "77"  A* 


So,  as  Ak  gets  smaller,  the  quantity  |  — — p —  l  |  approaches  0,  and  hence 

rTtdfc 

,  .  Aredk 

the  condition  — ^ >  ??!  will  be  met  after  a  finite  number  of  trials.  This 

completes  the  proof.  ■ 


The  following  theorem  proves  that  under  the  standard  Assumptions,  either 
the  algorithm  terminates,  or  converges  to  a  feasible  point. 
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Theorem  (4.20) 

Let  the  standard  assumptions  hold.  Assume  that  {  &k  }  is  bounded  below  on  Cl  . 
If  the  algorithm  does  not  terminate,  then 

II  hk  ||2  =  0  . 

K  —+■00 

Proof 


Suppose  lim  sup  1 1  hk  |  |2  —  e0  >  0  .  Then  there  exists  an  infinite  sequence  of 
indices  {  kj  }  such  that  1 1  hk  |  |2  >  —  for  all  k  e{kj  }. 

Let  k  be  such  that  k  c  {  kj  },  k>k  and  Ajc  >  0  .  Since  h  e  C2  ,  we  have 
for  some  /3  >  0  and  any  x  c  Cl  that 

II  M*)  M2  >  II  h  ||2-  ||  h{x)-hk  ||2 


First  we  will  show  that  eventually  the  iterate  must  move  outside  Ba  . 
If  xk  e  Ba  for  all  k  >  k  ,  then  from  lemma  (4.2)  and  rk  >  1  , 


Predk  > 


1  1 1  h  1 12 

- - min 
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> 


1 

2 


II  hk  |  1 2 

2  b1 


min  [  Ak 


II  hk  |  1 2 

2  b2 


If  all  k  ~>  k  are  not  acceptable  steps,  then  we  get  a  contradiction  with  Theorem 
(4.19).  Hence,  there  exists  an  infinite  sequence  of  indices  indexed  successful  steps 
inside  the  ball.  For  any  such  k  we  have 


~$k+ 1  =  A  redk 


>  rjy  Predk 


> 


Hi 

2 


II  hjc  I  1 2 
2  &! 


min  [  Ak 


II  h  ||2 

2  b2 


(4.4.2) 


Since  <J>*  is  bounded  below  and  1 1  ||2  >  0  ,  then  inequality  (4.4.2)  implies 

that 


lim  inf  Ak  =  0  (4.4.3) 

k-t-oo  v  ' 

Define  ox  to  be  a  constant  that  satisfies: 


<  min  [  1  , 


a  b  Ar 


lUi 


<*  1  r*  (  1  -  V2  ) 


where  a  —  max  [  r*  ,  2  r*  a0  J  and  b  =  max  [  bx  ,  b2  ]  .  Now,  because  of  (4.4.3), 
there  exist  some  sufficiently  large  k  such  that 


A*  -  (4-4-4) 

Let  m  be  the  first  integer  greater  than  k  such  that  (4.4.4)  holds.  This  implies 
that  m  >  £+1  ,  and  using  (3.1.3)  we  get 

"  1 1  *m-l  1 12  <  — 

"I 
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<  —  (1-*,) 


(4.4.5) 


<  <7J  (  1  -  7?2  )  <  crX  • 


Now,  by  Lemma  (4.2) 


p  j  \  1  II  ^m-1  I  1 2 
*redm-i  >  — - : - mm  |  1 1  s 


II  *m-l  Ih 


2  6X 
and  since  m  —  1  >  k  ,  we  have 


m— 1  I  12  > 


^m-l  I  I  2  > 


I U*  II, 


>  <?1  • 


From  (4.4.6)  and  (4.4.8)  we  have 


6  I  I  I  I2  5;  II  ^m-1  I  1 2  - 

By  substituting  the  last  inequality  and  (4.4.8)  into  (4.4.7),  we  obtain 


Predm_ i  >  — — 
m  1  ~  2  b 


sm- 1  I  12 


But,  by  Corollary  (4.7), 


So, 


|  Aredm_i  Predm_ j  |  <C.  a0  f*  II  sm— 1  1 12 


|  ^re^m- 1  Predm_  1  ^  2  a0  b  r*  1 1  gm-1  1 1| 

Predm-i  ~  *1  II  Sm_x  ||2 


Now  using  (4.4.5),  we  obtain 


Aredm-i  —  Predm_1  2  o0  r,2  <7, 

1  <  - r - (  1  -  h2  ) 


Predm_l 


cr1  a 


(4.4.6) 


(4.4.7) 


(4.4.8) 


(4.4.9) 


<  (  1  —  »?2  )  • 
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This  implies  that 


^redm-i  . 

-  >  n9 

Predm_x  ~  '2 


Hence  from  the  rule  of  updating  the  radius  of  the  trust  region,  we  have 


A m  —1  ^  Am 

The  last  inequality  implies  that  k  =  m- 1  satisfies  (4.4.4).  This  contradicts  the 
supposition  that  m  is  the  smallest  such  index  and  means  that  there  is  no 
m,  >  k  such  that  (4.4.4)  holds.  Hence,  for  all  k  >  k  ,  we  have 


A*  > 


Q?!  al  r* 
a  b 


(  1  -  %  ) 


which  contradict  (4.4.3).  Hence,  eventually  {  xk  }  must  leave  the  ball  Ba  for 
some  k  >  k  . 

Let  /  +1  be  the  first  integer  greater  than  k  such  that  x,+1  does  not  lie  inside  the 
ball  Ba  .  Since  xi+l  ^  xk  ,  there  must  exist  at  least  one  acceptable  step  in  the  set 
of  iterates  indexed  {  k,...,l},  so  by  Lemma  (4.2), 

* k  ~  */+ i  =  E  i*k-  *k+ 1 ) 

k=k 


>  X)  Vi  Predk 

ktS(k,l) 


>  y  Vl  II  ht  Ha 

keS(k,l)  2  2^1 


min  [  Ak  , 


lk  112 


2  br 


*£  112 


2  b' 


for  all  k  e  S(k,l )  ,  then 


*i  -  *i+i  >  Hr  ■ 1 L":  1  lz  S  Ai 


2  2  6 


1  ktS(k,l) 


If  A,  < 
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Otherwise, 


In  either  case 


>.  Vl  1 1  H  i  1 2 

> - cr  . 

~  2  2  b1 


*£  ~  *i+i  > 


%  II  Ml  I 

2  4  &2 


**  - 

+ 

IV 

2 

II  Ml*  .  r 
2  6,  1 

°  , 

1  1  fyfc  1  2 

2  62 

1 

m 

II  h  ||2  . 

m  i  n 

l  JJ 

1  ^ k  1  1 2 

II  h  ||s 

2 

.  Ill ill 

2  bt 

2  /? 

2  62  J 

II  MU  .  r 

1 

1  . 

(4.4.10) 

2 

4  6,  m,"( 

J 

.  T-]  • 

»2 

Since 

{** 

}  is  bounded  below  and  a  decreasing 

sequence,  {  <!>* 

.  }  converges  to  some 

limit 

'i>. . 

Take  the  limit  as  l 

goes  to  infinity 

on 

inequality 

(4.4.10),  we  get 

-  4>*  > 

Vi 

IU*  III  .  r 

- - - min 

1 

2 

4  bt  [ 

P 

b  2  J 

If  we 

take 

the  limit 

as  k  goes  to  infinity,  we  get 

0  > 

Vi 

e0  •  r  1 

- mm  —  ,  - 

i-] 

2 

8  6  x  1  0  ’ 

b-2  1 

which  contradicts  e0  >  0  .  The  supposition  is  wrong  and  hence  the  theorem  is 
proven.  ■ 


Theorem  (4.21) 

Let  the  standard  assumptions  hold.  Assume  that  {  4>t  }  is  bounded  below  on  tt  . 
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If  the  algorithm  does  not  terminate,  we  have 

lim  inf  1 1  Pk  Vlk  |  |2  =  0 
£—►  00 

Proof 


The  proof  is  by  contradiction.  Suppose  that  there  exists  an  e0  >  0  and  an 
integer  K  such  that  1 1  ^  |  |2  >  e0  for  all  k>K  . 

Since,  by  using  (4.2.14), 

lln(  VI,  +B„  ij)  II,  >  1 1  Pk  VI,  II,  -  6,  II  A,  II,, 

and  since 


lira  ||  hk  ||2  =  0, 

k—+cc 

there  exist  kx  sufficiently  large  such  that  for  all  k  >  kt  ,  we  have 

II  K  ||2  <  e0  . 

Thus  for  k  >  max  [  K  ,  kx  ] 


1 1  Pk  (  ^4  +  Bk  sg  )  1 12  >  e0  —  y  e0  =  J  e0  . 
Now,  since  from  (4.2.9)  and  Corollary  (4.12), 


Predk  >  -J  I  I  Pk  (  ^4  +  Bk  s%  )  1 12  min  [  Ak 


\\Pk(Vlk+Bk$i)\\, 


~  (bi  1 1  h  I  U  1 1  h  1 12)  -  («4  1 1  h  1 12  +  a5  1 1  sk_h  |  |2)  I  I  hk  1 12  , 

and  since  ||  hk  ||2  converges  to  zero  and  1 1  sk  1 12  and  ||  sk_h  1 12  are 
bounded,  then  there  exists  an  integer  k%  >  max  {  K  ,  kk]  such  that  for  all 
k  >  k2  we  have 
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Predt  >  }  1 1  Pt(V/t  +  B„il )  1 12  min  [  i  At  ,  -H  TT* L±  UJl  |  . 

®  "  20q 

Thus,  for  all  k  >  k2  ,  we  have 


n  i  —  1  £0  •  r  1  *  £0 

Preak  > - min  — ,  - 

82  1  2  *  46n 


From  Theorem  (4.19)  there  exists  an  infinite  sequence  of  successful  iterations. 
Now,  for  any  successful  iteration  indexed  k  >k2  ,  we  have 

Aredk  >  rj1  Predk 


^  •  r  a  £0  , 

If  k2  >  max  [k2  ,k  }  ,  then  the  last  inequality  and  the  assumption  that  {$*}  is 
bounded  below  imply  that 


This  implies  that 


00  >  £_($*-  **+i  )  =  £  Aredk 

k=k%  k=kn 


\  ^1  f  a  £0 

i  > ,  —  en  min  A*.  ,  - 

“  ,00}  32  0  1  ‘  2  6„ 


lim  inf  A*  =  0  . 

h-t-oo 


This  means  that  there  exists  an  integer  k3  >  k2  such  that 


(4.4.11) 


A,  < 


°h  a2 


(  1  ~  h2  ) 


(4.4.12) 


32  a 

is  satisfied  for  some  k  >  k3  ,  where  a  =  max  [  1  ,  - 5 - ]  anc{  js  defined  to 

£o 


be  a  constant  that  satisfies 
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a  A, 


<r2  <  min  [  1  , 


'*8  fo 


a  l  (  1  -  *72  )  ’  2  60 


Let  m  be  the  first  integer  greater  than  k3  such  that  (4.4.12)  holds.  This  implies 
that  m  >  fc3+l  .  So,  from  (3.1.3), 


1 1  *m-l  1 1 2  < 


Oi- 1  (J  C\ 


(4.4.13) 


<  ^2  (  1  “  »?2  ) 


<  <r2  < 


2  6r 


But  since, 


Frerfm-!  >  ^min  [  II  «»-l  ll2 


e0 


’  2  b0  J  ’ 


we  obtain 


>  3j- 


5m— 1  I  12  • 


So,  by  using  (4.4.9),  (4.4.13)  and  the  last  inequality,  we  get 

I  Aredm_  1  Predm_±  .  32  o,q  r*  |  |  sm_j  |  |2 

Predm-i  ~  70 


32  a0  r*  <r2 

<  - (  1  ~  %  ) 


en  a 


<  <t2(1-ti2)  <  (  1  -  n2  )  • 
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The  last  inequality  implies  that 

Aredm_1 

Predm_x  ~  V2  ' 

Hence,  from  the  rule  of  updating  the  radius  of  the  trust  region  in  Algorithm 
(3.1.2),  we  obtain 

Am-i  <  Am  . 

This  implies  that  m—  1  satisfies  (4.4.12)  which  contradicts  the  assumption  that 
m  is  the  smallest  integer  >  k3  such  that  (4.4.12)  holds.  Hence,  for  all  k  >  k3  , 
we  have 


The  last  inequality  contradicts  (4.4.11).  The  supposition  is  wrong  and  hence  the 
theorem  is  proven.  ■ 

Corollary  (4.22) 

Under  the  standard  assumptions.  If  {  }  is  bounded  below,  then 

lim  inf  [  ||  hk  1 12  +  ||  Pk  Vlk  ||2  ]  =0 

K  — ^OO 

Proof 

The  proof  follows  immediately  from  Theorem  (4.20)  and  Theorem  (4.21). 


From  the  last  corollary  and  the  termination  condition  in  step  1  of  the  algo¬ 
rithm,  we  can  conclude  that  the  algorithm  will  terminate  at  a  point  within  e  of  a 
Kuhn-Tucker  point. 
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CHAPTER  FIVE 

THE  LOCAL  ANALYSIS 


In  this  chapter  we  discuss  the  local  analysis  of  our  algorithm  when  the 
sequence  {xk}  converges  to  a  solution  x*  .  We  will  assume  that  x *  satisfies  the 
second  order  sufficiency  condition. 

In  Section  5.1  we  state  the  local  assumptions.  The  local  analysis  of  our  algo¬ 
rithm  is  presented  in  Section  5.2.  It  consists  of  three  parts.  In  the  first  part  of 
this  section  we  study  the  behavior  of  the  penalty  parameter  in  a  neighborhood  of 
x*  .  In  the  second  part  we  discuss  the  decrease  we  get  in  the  model  by  the  trial 
step.  The  third  part  of  this  section  is  devoted  to  studying  the  local  rate  of  con¬ 
vergence  of  our  algorithm  in  a  neighborhood  of  the  minimizer  x *  .  We  will  show 
that,  in  a  neighborhood  of  the  minimizer,  the  algorithm  will  reduce  to  the  stan¬ 
dard  SQP  algorithm;  hence  the  local  rate  of  convergence  of  SQP  is  maintained. 

5.1  THE  LOCAL  ASSUMPTIONS 

We  assume  the  following  assumptions: 

1)  The  sequence  {  xk  }  converges  to  a  Kuhn-Tucker  point  x*  . 

2)  x*  satisfies  the  second  order  sufficiency  condition,  i.e.  there  exists  a  X* 
such  that 


vT  V|/(  x*  ,  X*  )  v  >  0 
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for  all  v  that  satisfies  Vh(x*)T  v=0. 

3)  V|/  is  Lipschitz  continuous  with  respect  to  x  in  the  neighborhood  of  the 
solution  x*  . 

4)  There  exists  k0  sufficiently  large  such  that  for  all  k  >  k0  ,  we  have 

I  I  Qk  V4  I  1 2  5;  e0  I  I  h  I  1 2 

where  e0  is  a  constant. 

Remarks 

Assumption  (4)  is  equivalent  to  assuming  that  the  asymptotic  progress  in  X 
is  at  least  of  the  same  order  as  the  asymptotic  progress  in  x  . 

Numerical  experiments  have  shown  that  for  SQP,  sQP  and  AXQP  have  not 
failed  to  satisfy 

II  ||2  <  c  ||  ^  ||2,  (5.1.1) 

in  the  neighborhood  of  the  solution,  where  "  c  "  in  this  remark  is  used  to  denote  a 
generic  constant  independent  of  k  .  If  the  step  is  the  SQP  step  then  inequality 
(5.1.1)  implies  Assumption  (4)  since  1 1  Qk  Vlk  1 12  <  c  1 1  AX*  1 12  +  c  1 1  §k  1 12  . 
On  the  other  hand,  if  sk  is  the  CDT  step  and  if  1 1  sCDT  \  |2  ss  ||  SQP  ||2  , 
then  ||  A\cdt  1 12  will  be  near  ||  AXQP  ||2  ,  since  AX  is  linear  in  s  ,  and  we 
expect  the  CDT  step  to  have  the  same  behavior.  If  sCDT  and  sQP  are  different, 
we  expect  A\CDT  to  give  a  better  progress  in  X  than  that  we  get  from  &\QP 
because  numerical  experiments  show  that  if  sQP  is  a  bad  step  then  AXQP  will 
also  be  a  bad  step. 


Assumption  (4)  and  more  is  assumed  by  Gill,  Murray,  Saunders  and  Wright 
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(1986). 


5.2  THE  LOCAL  ANALYSIS  OF  THE  ALGORITHM 


This  section  is  devoted  to  presenting  the  local  analysis  of  our  algorithm  when 
it  converges  to  a  local  minimizer  that  satisfies  the  second  order  sufficiency  condi¬ 
tion.  In  Section  5.2.1  we  study  the  behavior  of  the  penalty  parameter.  We  will 
prove  that  under  the  local  assumptions  the  penalty  parameter  is  bounded.  In 
Section  5.2.2  we  discuss  the  predicted  reduction  that  will  be  obtained  locally.  The 
third  part  of  this  section  is  devoted  to  studying  the  local  rate  of  convergence  of 
our  algorithm  in  the  neighborhood  of  a  minimizer  that  satisfies  the  second  order 
sufficiency  condition. 

5.2.1)  The  Asymptotic  Behavior  of  The  Penalty  Parameter 

In  this  section  we  prove  lemmas  needed  to  study  the  behavior  of  the  penalty 
parameter.  In  Lemma  (5.5)  we  prove  under  the  local  assumptions  that  the 
penalty  parameter  is  bounded  in  a  neighborhood  of  a  minimizer  that  satisfies  the 
second  order  sufficiency  condition. 

Lemma  (5.1) 

In  a  neighborhood  of  a  minimizer  that  satisfies  the  second  order  sufficiency  condi¬ 
tion,  there  exists  a  constant  e1  such  that 

1 1  e,  (Vi.  +  i*i/ )  1 1,  >  «i  IMfllj 
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where  Pk  ,  sk  and  sk  are  as  in  Corollary  (4.3). 

Proof 

Since,  using  Lemma  (4.8),  we  have 

mT  Bk  it  <  -  [  Pk  (  V4  +  Bk  s£  )  \Ts£  .  (5.2.1) 

The  last  inequality  can  be  written  as 

mT  ( PkBkPk)H  <  ~[Pk(  V4  +Bk  s£)}TH  ■ 

Now,  since  (  Pk  Bk  Pk  )  is  positive  definite  in  a  neighborhood  of  the  minimizer, 
then  there  exists  a  constant  ex  such  that 

I  MU  II  <  (s£)T(PkBkPk)s£  .  (5.2.2) 

So,  using  (5.2.1)  and  (5.2.2),  we  can  write 

II#  Ms  < 

Hence  we  get  the  desired  result.  ■ 

Lemma  (5.2) 

In  a  neighborhood  of  a  minimizer  that  satisfies  the  second  order  sufficiency  condi¬ 
tion,  if  II  hk  1 12  <  e2  1 1  sk  1 12  where  e2  <  — —  and  63  is  as  in  Lemma 

2  b  3 

(4.9),  then  there  exists  a  constant  e3  such  that 

1 1  Pk  (V4  +  Bk  s£  )  1 12  >  e3  1 1  sk  1 12  . 

Proof 

Since  I  M*  1 12  <  I  Ml  1 12  +  1 1  1 12  >  by  using  Lemma  (4.9)  and  Lemma  (5.1), 
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we  get 


ei  I  I  h  1 12  I  I  Pk  (V/*  +  Bk  sk  )  |  |2  +  e.\  63  1 1  hk  |  |2 


^  I  I  Pk  (V4  +  4? )  |  I2  +  ej  e2  63  1 1  4  1 12  • 

Hence, 


ci(  i-c263)  II  4  ll2  <  1 1  Pk  (V4  +  5,  4? )  1 12 . 


So, 


~  11 4  ii2  <  11  pk  m  +Bk  »D  n2. 


The  result  then  follows  if  we  set  e3 


Lemma  (5.3) 


Let  4  be  the  step  generated  by  the  algorithm.  Let  Pk  ,  Ak  ,  sg  and  4s  be  as 
in  Corollary  (4.3),  then  for  all  k  sufficiently  large,  there  exists  a  constant  e4 
such  that 


Predk  >  j  1 1  n(V/t  +  Bk 4?  )  1 12  min  [  Ak 


II  Pk(Vlk  +Bksl)  ||2 
2  60 


-  e4  1 1  4  1 12  1 1  h  1 12  +  rk  [  II  A*  III  -  II  hk  +  V4t4  III],  (5.2.3) 


Proof 


Since,  from  Lemma  (4.10),  we  have 

Predt  >  i  1 1  Pt(yi„  +  B„il )  1 12  min  [  A,  .  "  P*(V'l ,+  Btij  )  "2  ] 

2  60 


-‘.II  h  It  II  h  ||2-  I  (V4  +  | 
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+  rk  [  1 1  hk  |  ||  —  |  |  hk  +  V/i/s*  1 1|  ]  . 

Then,  by  using  Assumption  (5.1.4),  for  all  k  >  k0  ,  we  have: 

P'^k  >  7  1 1  +  Btsl )  1 12  min  [  A,  ,  i!Agj>±^j£Ilk  ] 

H  2  i  I  £>  1 12 

—  b*  1 1  h  II2  1 1  hk  I  |2  -  (e0  +  60)  ^8  I  I  4  I  I2  II  hk  1 12 
+  'k\  IK  III  -  II  h  +  VhkTsk  |||]  . 

Hence,  if  we  set  e4  =  64  +  (e0+60)  bs  ,  we  get  for  all  k  >k0 

r™ U  >7  1 1  r.(V/t  +  B„»l )  1 12  min  [  At ,  -11  P^+B^D  It  J 

2o0 

-  e4  I  I  sh  I  I2  1 1  h  I  |2  +  rk  [  1 1  hk  1 1|  -  |  |  hk  +  Vhf  Sk  III]. 

Hence  we  get  the  desired  result.  ■ 


The  first  term  and  the  third  term  in  (5.2.3)  are  positive,  and  the  second  is 
negative.  In  order  to  prove  that  we  will  get  a  positive  predicted  reduction  each 
iteration,  we  have  to  prove  that  the  positive  quantities  are  greater  than  or  equal 
to  the  negative  quantity  otherwise  we  have  to  increase  the  penalty  parameter  to 
insure  that. 


Lemma  (5.4) 

Under  the  local  assumptions,  if  |  \  hk  \  |2  <  e5  1 1  sk  1 12  where  e&  is  chosen 
such  that: 


e5  <  min 


1  e3  min  (  ^  &0  ,  e3  ) 


2  63  ’  16  60  e4 


(5.2.4) 
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where  b0  is  a  uniform  upper  bound  on  1 1  Bk  1 12  ,  b3  is  as  in  Lemma  (4.9),  e3 
is  as  in  Lemma  (5.2)  and  e4  is  as  in  Lemma  (5.3).  Then 


Predt  >  i  |  \Pi(Vlt  +  B„»l )  |  |s  min  [  ^  |U||2,  1 1  P‘(VI> '  +  >  1 1 

8  2  2  bn 


bfc 


+  -f  [  II  A*  III-  II  hk+VhkTsk  |||]  . 


(5.2.5) 


Proof 


From  Lemma  (5.3),  we  have 

fV«4  >  J  1 1  A(V/t  +  B„>1 )  I  |s  min  [  A,  ,  J  1  )  I  ji 

4  2  60 

e4  II  I  1 2  I  I  hk  |  1 2 


+  rk  \  ||  hk  Ml-  \\hk+Vh?ek  |||]  . 


Now, 


At  =  V  Af-  II 


4?lll 


By  using  Lemma  (4.9)  and  1 1  hk  1 12  <  — — 

2  6  q 


II 2 


A,  >  VA jj-bj  II  K  III, 


and  we  obtain 


A,  >  VI  -  (1/4)  ||  sk  ||, 


=  —  II  I  Is  -  (5.2.6) 

Now,  since  I  I  hk  |  |2  <  e5  1 1  sk  1 12  and  e5  <  e2  then  by  using  Lemma  (5.2)  we 
have  1 1  Pk  (  V4  +  Bk  8%)  ||2  >  e3  1 1  sk  1 12  ,  and  by  using  (5.2.4)  and  (5.2.6), 
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we  get 


g-  I  I  Pk(^h  +  Bks£ )  |  |2  min  [  Ak  , 


M  Pk(Vlk  +Bks2)  ||2 

2  bn 


e4  II  h  1 12  1 1  h  | \. 


V3  e3 
2  ’  2  6n 


>  4-  e3  1 1  sk  1 1|  min  [  ]  -  e4  e5  1 1  sk  1 122 


>  0 


The  rest  of  the  proof  follows  immediately. 


In  the  last  lemma,  we  have  proven  that  if  1 1  hk  1 12  <  e5  1 1  sk  1 12  ,  then  half 
of  the  first  term  in  (5.2.3)  would  cancel  the  second  term,  and  the  third  term  need 
never  enter  the  calculation.  This  implies  that  if  we  set  rk  =  rk_k  ,  inequality 
(5.2.5)  remains  correct.  So,  in  this  case,  we  do  not  need  to  increase  the  penalty 
parameter. 


Lemma  (5.5) 

Under  the  local  assumptions,  the  penalty  parameter  is  bounded. 

Proof 

The  proof  is  by  contradiction.  Suppose  that  {  rk  }  is  not  bounded.  This  implies 
that  there  exists  an  infinite  subsequence  of  indices  {kj }  at  which  {rk }  is 
unbounded.  Now,  from  Lemma  (5.4),  we  never  increase  the  penalty  parameter  if 
1 1  h  1 12  <  e5  1 1  h  1 12  .  So,  for  any  k  e{kj  }  , 


1 1  hk  M 2  >  e5  ||  sk  || 2  - 


(5.2.7) 
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Let  m  e  {  kj  }  and  by  using  (4.3.1),  we  can  write 

*  I  I  I  1 2  .  r  i |  a  ||  I  I  I  1 2  i 


6, 


min 


II 2  y 


^  ^0  I  I  i  1 2  I  I  I  1 2 
+  ^8  (  II  Qk  ^lm  I  I2  +  b0  1 1  Sm  1 12  )  1 1  hm  1 12 

+  /?  6g  ||  I  1 2  ||  hm  1 1 2  - 


If  we  use  (5.2.7)  and  the  local  assumptions,  we  get 


I  I  5 m  I  1 2  e5  1  -  ,  11^  II 

rm  —  2  y - nun  [  1  ,  —  ]  <  b4  |  |  sm  |  |2 


+  [  ^8  (  e0  +  M  +  P  &9  ]  I  I  I  I2 


where  &4  is  as  in  Lemma  (4.10).  Hence, 


2  b4  62 


min  [  62  ,  es  ]  <  e4  +  p  bg  , 


where  e4  is  as  in  Lemma  (5.3).  Set 


N  =  [  e4  +  p  be 


2  6j  b2 


min  6  2  ,  e5 


Since  N  is  independent  of  m  ,  it  is  an  upper  bound  of  the  sequence  {  rkj  }  con¬ 
tradicting  the  assumption  that  the  sequence  {  }  has  no  upper  bound.  This 

proves  the  theorem.  ■ 


From  the  last  lemma,  we  can  conclude  that  for  all  k  ,  1  <  rk  <  r»  where 
r *  is  a  constant  independent  of  k  . 

Since  if  rk  increases,  it  will  increase  by  a  quantity  >  p  ,  then  the  number 
of  iterations  at  which  the  penalty  parameter  increases  is  finite.  Hence,  there  exists 
a  constant  k  such  that  rk  =  rk  for  all  k  >k  . 
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The  following  theorem  shows  that  the  algorithm  is  well  defined  in  the  sense 
that  at  any  iteration  either  the  point  ( xk  ,  X*)  is  a  Kuhn-Tucker  point  or  the  algo¬ 
rithm  will  always  find  an  acceptable  step. 


Theorem  (5.6) 

Under  the  local  assumptions,  either  the  point  (  xk  ,\k  )  is  a  Kuhn-Tucher  point 

or  we  always  find  an  acceptable  step.  i.e.  the  condition  - ^  will  be 

Predk+j 

satisfied  for  some  j  . 

Proof 

If  the  point  (  xk  ,  \k  )  is  a  Kuhn-Tucker  point,  then  there  is  nothing  to  prove. 
Hence,  consider  the  case  when  the  point  (  xk  ,  \k  )  is  not  a  Kuhn-Tucher  point. 

First,  we  assume  that  1 1  hk  |  |2  >  0  .  Since,  from  Corollary  (4.2),  we  have 

r*  IK  II*  •  r  a  II  h  ll2  , 

As  Ak  gets  smaller,  we  get 

Predt  >  h.  _11A  '  li  Al  , 

k  ~  2  bl  k  ’ 
and  since  from  Corollary  (4.7), 


I  Aredk  ~  Predk  |  <  a0rk  Af  , 


then,  we  have 


Aredk  —  Predk  <  2  a0b1  Ak 


Predi. 


II  hk \u 


That  is, 
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,  Aredk  .  2  a«  6i 

_ i_  —  i  <r  u  1  a 

'  Predk  '  S  \\hk\\,k- 

ylrscft 

Now,  as  Ak  again  gets  smaller,  the  quantity  |  — — A.  _  i  |  approaches  0  and 

lTCdfc 

Aredk 

hence  the  condition  — .  >  rjx  will  be  met  after  a  finite  number  of  trials. 

ivcdfc 

Now,  assume  that  1 1  hk  ||2  =  0  .  Note  that  since  we  are  considering  the  case 
when  the  point  (  xk  ,  \k  )  is  feasible  but  not  a  Kuhn-Tucker  point,  so 
1 1  Pk  (  V/t  )  1 12  >  0  .  From  Lemma  (4.10)  we  have 

Pr'dk  >  1  1 1  Pt(V 4  +  Bt$l )  1 12  min  [  3*  ,  i! / PA ■  +  M  l  1  k  ] 

4  2  o0 

—  ^4  I  M*  I  1 2  i  I  I  1 2  —  I  (V/*  +  BkSk)T  hk  | 

+  rk  [  1 1  hk  1 1|  —  |  |  hk  4-  Vhksk  1 122  ]  . 

Because  ||  hk  ||2  =  0,  ||  s*?ll2  =  0,  Ak  =  Ak  and  hk  =  0.  Thus, 

Predt  >  |  l|Pt(V4)||2min[At,  l|/>t(74)l1*  |. 

4  2  o0 

As  Ak  gets  smaller,  we  get 

Pred„  >  i-  lln(V/t)||2A,  . 

This  implies,  using  Corollary  (4.7),  that 

!  Aredk  -  Predk  ,  ^  4o0  r, 

Predk  -  1 1  Pk{  V4  )  1 12 

So,  as  A*  gets  smaller,  the  quantity  |  — — ^ - 1  |  approaches  0,  and  hence 

iTCdfc 

,  Aredk 

the  condition  — — —  >  rfx  will  be  met  after  a  finite  number  of  trials.  This 

iTCufc 
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completes  the  proof.  ■ 

The  last  lemma  implies  that  if  at  some  iteration  indexed  k  the  algorithm 
loops  infinitely  without  finding  an  acceptable  step.  Then  the  point  (  xk  ,  \k  )  is 
necessarily  a  Kuhn-Tucker  point. 

5.2.2)  Sufficient  Decrease  in  The  Model 

In  this  section  we  prove  Lemma  (5.7)  which  stated  that  locally  the  predicted 
reduction  in  the  model  gives  at  least  a  proportional  of  square  of  the  2-norm  of  the 
step. 


Lemma  (5.7) 


Under  the  local  assumptions,  if  sk  is  the  step  generated  by  the  algorithm,  then, 
for  k  large  enough,  there  exists  a  constant  e6  such  that 

Predk  >  e6  1 1  sk  |  || 


Proof 
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On  the  other  hand,  when  1 1  hk  |  |2  >  e5  1 1  sk  |  |2  ,  we  have  from  Corollary  (4.2) 
that 


\  1  I  I  hk  I  I2  •  f  \  1 1  h  I |2 

rredi.  > - min  Aj. 


2  b1 


H  1 


b c 


>  — 
~  2 


1  e5  I  I  «*  I  II  .  r  .  e5 


min  [  1  , 


Take  e6  =  min  {  —  ^  min  [  V3  60  ,  e3  ]  ,  ---  ^  min  [  62  ,  e5  ]  },  we  get 


2  6^  62 


Pred^.  >  ee 


4  III 


Hence  we  get  the  desired  result. 


5.2.3)  The  Asymptotic  Rate  of  Convergence 

In  this  section  we  will  assume  that  for  each  k  ,  Bk  is  the  exact  Hessian  of 
the  Lagrangian  at  the  point  ( xk  ,\k). 

We  start  this  section  by  proving  Theorem  (5.8)  which  is  needed  to  study  the 
local  rate  of  convergence.  In  Theorem  (5.9)  we  prove  under  the  local  assumptions, 
for  k  sufficiently  large,  the  SQP  steps  will  always  be  taken.  So,  the  strategy  of 
taking  sQP  ,  if  possible,  will  make  our  algorithm,  for  large  k  ,  produce  the  SQP 
steps.  Hence,  for  large  k  ,  the  steps  are  the  SQP  steps  and  consequently  the  con¬ 
vergence  rate  of  (  xk  ,  \k  )  to  (  x*  ,  X*  )  is  q-quadratic. 


Theorem  (5.8) 


Under  the  local  assumptions,  if  sk  is  the  steps  generated  by  the  algorithm,  then 
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there  exists  kx  such  that  for  all  k  >  kx  ,  we  have 


Aredk 

Predy 


> 


That  is,  for  all  k  large  enough,  the  trust  region  radius,  A*  ,  will  be  inactive. 


Proof 


Since,  for  some  £  e  (  0  ,  1  )  , 

L{  xk+h  ,  X*+AX*  ;  rk  )  —  L{  xk  ,  \k+S\k  ;  rk  )  +  V,  L(  xk  ,  X*+AX*  ;  rk  )T sk 

+  y  $k  L  (  xk  +  £  sk  ,  Xt+AX*  ,  rk  )T sk 

=  L  (  xk  ,  \k  ;  rk  )  +  AX/(  hk  +  Vhksk  ) 

+  ^x  l  {  xk  .  ^k  )  +  y  &kBk  h 

+  rk  {  II  hk+VhkTsk  III  -  IK  III] 

+  y  4r[  ^x  L{  xk  +  £sk  ,  \k+A\k  ;  rk  )  —  V*  L(xk  ,\k+AXk  ;  rt)  ]  T sk 

+  y  **r  **  +  r*  k  ^k  $k  ■ 

Hence, 

Aredk  >  —  V//  sk  —  —  skBksk  —  A\k(hk  +  Vhk,sk) 

+  r*  [  II  fyfc  I II  —  1 1  hk  +  ^hksk  1 1|  ] 

—  0  (II  I  I22)  —  y  l®*^Kt  AXk  sk  |  —  r*  |  sk  hk  sk\  . 

Using  Lemma  (5.7),  for  k  large  enough,  we  have 

Aredk  ^  1  r  0  (  II  »k  III  )  I Sk^hk  AX*  sk  |  r»  | sk  V~hk  hk  sk  \ 

Predk  ~  e6  [  ||  4  1 1|  I  I  h  1 1|  |  |  4  |  || 


But,  since  by  the  local  assumptions  II  sk  ||2  — 1 ►  0,  ||  Vlk  1 12  —*■  0,  and 
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1 1  L  1 1  n  0  (  1 1  4  1 If  )  I  sk^hk  AXt  sk  | 

||  hk  1 1 2  — *•  0  ,  then  the  quantities  - — — —  ,  - 1-|— - — — - ,  and 


4  I II 


4  II  i 


r*  If/  h  4 

114  III 


are  arbitrary  small  for  A:  sufficiently  large.  Hence,  there 


exists  an  integer  k1  such  that  for  all  k  >  kk  ,  we  have 


,  r  0  (  1 1  4  I  U2  )  ,  \hT^hk  A\k  sk  |  p  |  sk  V2hk  hk  sk  | 

1  -  1  - - - — -  +  - - - rr^ - h  - - - — - }  >  V2  ■ 


4  III 


II  4  III 


4  III 


Consequently,  for  all  k  >  kk  ,  we  have 


Aredk 

Predi. 


>  V2  ■ 


(5.3.1) 


The  last  inequality  implies  that  the  trust  region  radius  A*  for  k  >  kk  is 
updated  according  to  the  rule 


Afc+i  =  max  (  A*  ,  a3  1 1  sk  |  |2  ]  . 

Hence,  Ak  >  Ati  for  all  k  >  kk  and  using  the  assumption  that 

1 1  sk  1 1 2  — ►  0  we  can  conclude  that  there  exists  an  integer  k2  >  kx  such  that 
the  trust  region  is  inactive  for  all  k  >  k2  .  Hence  we  get  the  desired  result. 


Theorem  (5.9) 

Under  the  local  assumptions,  for  k  sufficiently  large,  the  SQP  steps  will  be  taken 
and  consequently  (  xk  ,  \k  )  converges  to  (  x*  ,  X*  )  q-quadratically. 

Proof 

From  the  last  lemma,  A*  >  At  for  all  k  >  kk  .  Now  suppose  there  exists  an 
integer  ks  >  kr  such  that  sk  ^  s$p  for  all  k  >kz  .  This  implies  that,  for  all 
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k  >  k3 


||  s?p\U  >  At  >  A4j 

which  contradict  the  fact  that  ||  s$p  ||2  — ►  0  .  Therefore,  there  exists  at  least 
one  step  sk  =  s$p  where  kj  >  k3  . 

Let  k4  be  the  smallest  integer  greater  than  k3  such  that  =  s$p  ,  and  such 
that  SQP  method  generates  steps  that  satisfies 


where  uk 


II  «*+i  II2  <  mi  II  «*  II!  » 


and  m,  is  a  constant. 


But,  since  the  SQP  steps  {  sj?p  }  converge  r-quadratically.  This  implies  that,  for 
all  k  >  k4  ,  we  have 

1 1  s?P  1 12  <  (a2)*4  . 

where  m2  ,  a  are  constants  and  a  <  1  .  This  means  that  if  we  choose  k4 
sufficiently  large  such  that 

™2  (a2)*'  <  A k 4  . 

Then,  1 1  s$p  1 12  <  At<  and  for  all  k  >  k4  ,  we  have 

II  «»«Pll2  <  A., 

But  since,  for  k  >  kx  ,  we  have  A*.  <  A^+j  ,  then 

I  I  sk^+ 1  I  1 2  <  <  Afc.,+1  • 


The  last  inequality  and  the  fact  that  for  all  k  >  k2  all  the  steps  are  acceptable 
steps  imply  that 
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s*4+ 1  —  ^*4+1  • 

By  induction,  for  all  k  >  k4  ,  we  can  conclude  that 

QP 

Sk  =  s£  ■ 

This  means  that  the  sequence  {  xk  ,  k  >  k4  }  generated  by  the  algorithm  is  the 
sequence  of  the  SQP  iterates  and  consequently  the  local  rate  of  convergence  is  q- 
quadratic.  ■ 
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CHAPTER  SIX 

CONCLUDING  REMARKS 

We  have  considered  a  trust  region  algorithm  for  solving  the  equality  con¬ 
strained  optimization  problem.  This  algorithm  is  a  variant  of  the  1984  Celis- 
Dennis-Tapia  algorithm.  We  have  presented  a  global  and  local  convergence 
analysis  for  this  algorithm. 

Our  global  convergence  theory  is  sufficiently  general  that  it  holds  for  any 
algorithm  that  generates  steps  that  give  at  least  a  fraction  of  Cauchy  decrease  in 
the  quadratic  model  of  the  constraints. 

The  subproblem  that  has  to  be  solved  at  each  iteration  is  not  in  general  the 
successive  quadratic  programming  subproblem.  However,  we  have  shown  that 
under  mild  assumptions,  in  the  neighborhood  of  the  minimizer,  the  algorithm  will 
reduce  to  the  standard  SQP  algorithm;  hence  the  local  rate  of  convergence  of  the 
SQP  in  maintained. 

The  augmented  Lagrangian  function  was  used  as  a  merit  function.  A  scheme 
for  updating  the  penalty  parameter  was  presented.  The  behavior  of  the  penalty 
parameter  was  discussed. 

For  future  work,  there  are  many  questions  that  should  be  answered: 

Although  intensive  numerical  experiences  with  the  CDT  algorithm  were 
reported  by  Celis,  Dennis  and  Tapia  (1984),  Celis  (1985)  and  Celis,  Dennis  and 
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Tapia  (1987),  we  believe  that  the  implementation  of  the  algorithm  must  be 
refined.  In  particular,  an  efficient  algorithm  for  solving  the  CDT  subproblem  is 
needed.  This  will  require  a  closer  look  at  the  CDT  subproblem  and  the  charac¬ 
teristics  of  its  solution.  Currently,  this  is  the  topic  of  much  research,  e.g.  Yuan 
(1987),  but  the  problem  has  not  been  solved. 

A  related  important  question  that  has  to  be  looked  at  is  how  to  approximate 
the  Hessian  of  the  Lagrangian  in  order  to  be  used  to  produce  an  efficient  algo¬ 
rithm. 

Another  important  research  topic  that  should  be  considered  is  how  to  gen¬ 
eralize  this  approach  to  handle  the  inclusion  of  nonlinear  inequality  constraints  in 
the  problem. 
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